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(54) Tide: METHODS AND COMPOSITIONS FOR GENOMIC MODIFICATION 
(57) Abstract 

The present invention provides methods of site-specifically integrating a polynucleotide sequence of interest in a genome of a 
eucaryotic cell, as well as, enzymes, polypeptides, and a variety of vector constructs useful therefore. In the method, a targeting construct 
comprises, for example, (i) a first recombination site and a polynucleotide sequence of interest, and (ii) a site-specific recombinase, which 
are introduced into the cell. The genome of the cell comprises a second recombination site. Recombination between the first and second 
recombination sites is facilitated by the site-specific recombinase. The invention describes compositions, vectors, and methods of use 
thereof, for the generation of transgenic cells, tissues, plants, and animals. The compositions, vectors, and methods of the present invention 
are also useful in gene therapy techniques. 
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Methods and Compositions for Genomic Modification 

Field of the Invention 

The present invention relates to the field of 
5 biotechnology, and more specifically to the field of 
genomic modification. Disclosed herein are 
compositions, vectors, and methods of use thereof , for 
the generation of transgenic cells, tissues, plants, and 
animals. The compositions, vectors, and methods of the 
10 present invention are also useful in gene therapy 
techniques . 

Background of the Invention 

Permanent genomic modification has been a long 
15 sought after goal since the discovery that many human 
disorders are the result of genetic mutations that 
could, in theory, be corrected by providing the patient 
with a non-mutated gene. Permanent alterations of the 
genomes of cells and tissues would also be valuable for 
20 research applications, commercial products, protein 
production, and medical applications. Furthermore, 
genomic modification in the form of transgenic animals 
and plants has become an important approach for the 
analysis of gene function, the development of disease 
25 models, and the design of economically important animals 
and crops . 

A major problem with many genomic modification 
methods associated with gene therapy is their lack of 
permanence. Life-long expression of the introduced gene 
30 is required for correction of genetic diseases. Indeed, 
sustained gene expression is required in most 
applications, yet current methods often rely on vectors 
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that provide only a limited duration of gene expression. 
For example, gene expression is often curtailed by shut- 
off of integrated retroviruses, destruction of 
adenovirus -infected cells by the immune system, and 
5 degradation of introduced plasmid DNA (Anderson, WF, 

Nature 329:25-30, 1998; Kay, et al , Proc . Natl. Acad. 
Sci. USA 94:12744-12746, 1997; Verma and Somia, Nature 
389:239-242, 1997). Even in shorter-term applications, 
such as therapy designed to kill tumor cells or 

10 discourage regrowth of endothelial tissue after 
restenosis surgery, the short lifetime of gene 
expression of current methods often limits the 
usefulness of the technique . 

One method for creating permanent genomic 

15 modification is to employ a strategy whereby the 

introduced DNA becomes part of (i.e., integrated into) 
the existing chromosomes. Of existing methods, only 
retroviruses provide for efficient integration. 
Retroviral integration is random, however, thus the 

20 added gene sequences can integrate in the middle of 

another gene, or into a region in which the added gene 
sequence is inactive. In addition, a different 
insertion is created in each target cell. This 
situation creates safety concerns and produces an 

25 undesirable loss of control over the procedure. 

Adeno-associated virus (AAV) often integrates at a 
specific region in the human genome. However, vectors 
derived from AAV do not integrate site-specifically due 
to deletion of the toxic rep gene (Flotte and Carter, 

30 Gene Therapy 2:357-362, 1995; Muzyczk, Curr. Topics 
Microbiol. Immunol. 158:97-129, 1992). The small 
percentage of the AAV vector population that eventually 



2 



WO 00/1 1 155 PCT/US99/18987 

integrates does so randomly. Other methods for genomic 
modification include transfection of DNA using calcium 
phosphate co -precipitation, elect roporat ion, 
lipofection, microinjection, protoplast fusion, particle 
5 bombardment, or the Ti plasmid (for plants) . All of 
these methods produce random integration at low 
frequency. Homologous recombination produces site- 
specific integration, but the frequency of such 
integration is very low. 

10 Another method that has been considered for the 

integration of heterologous nucleic acid fragments into 
a chromosome is the use of a site- specif ic recombinase 
(an example using Cre is described below) . Site- 
specific recombinases catalyze the insertion or excision 

15 of nucleic acid fragments. These enzymes recognize 
relatively short, unique nucleic acid sequences that 
serve for both recognition and recombination. Examples 
include Cre (Sternberg and Hamilton, J Mol Biol 150:467- 
486, 1981), Flp (Broach, etal, cell -29 : 227-234 , 1982) 

20 and R (Matsuzaki, et al, J Bacteriology 172:610-618, 
1990) . 

One of the most widely studied site-specific 
recombinases is the enzyme Cre from the bacteriophage 
PI. Cre recombines DNA at a 34 basepair sequence called 

25 loxP, which consists of two thirteen basepair 

palindromic sequences flanking an eight basepair core 
sequence. Cre can direct site-specific integration of a 
loxP- containing targeting vector to a chromosomal ly 
placed loxP target in both yeast and mammalian cells 

30 (Sauer and Henderson, New Biol 2:441-449, 1990). Use of 
this strategy for genomic modification, however, 
requires that a chromosome first be modified to contain 
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a loxP site (because this sequence is not known to occur 
naturally in any organism but PI bacteriophage) , a 
procedure which suffers from low frequency and 
unpredictability as discussed above. Furthermore, the 
5 net integration frequency is low due to the competing 
excision reaction also mediated by Cre. Similar 
concerns arise in the conventional use of other, well- 
known, site-specific recombinases . 

A need still exists, therefore, for a convenient 
10 means by which chromosomes can be permanently modified 
in a site-specific manner. The present invention 
addresses that need. 

Brief Description of the Invention 

15 Accordingly, in one embodiment, the present 

invention is directed to a method of site-specifically 
integrating a polynucleotide sequence of interest in a 
genome of a eucaryotic cell. The method comprises 
introducing (i) a circular targeting construct, 

2 0 comprising a first recombination site and the 

polynucleotide sequence of interest, and (ii) a site- 
specific recombinase into the eucaryotic cell, wherein 
the genome of the cell comprises a second recombination 
site native to the genome and recombination between the 

25 first and second recombination sites is facilitated by 
the site-specific recombinase. The cell is maintained 
under conditions that allow recombination between the 
first and second recombination sites and the 
recombination is mediated by the site-specific 

30 recombinase. The result of the recombination is site- 
specific integration of the polynucleotide sequence of 
interest in the genome of the eucaryotic cell. 



4 



WO 00/11155 



PCT/US99/18987 



The recombinase may be introduced into the cell 
before, concurrently with, or after introducing the 
circular targeting construct. Further, the circular 
targeting construct may comprise other useful 
5 components, such as a bacterial origin of replication 
and/or a selectable marker. 

In certain embodiments, the recombinase may 
facilitate recombination between two sites designated 
recombinase-mediated-recombination sites (RMRS) and the 
10 RMRS comprises a first DNA sequence (RMRS5 1 ) , a core 
region A, and a second DNA sequence (RMRS 3 1 ) in the 
relative order RMRS5 ' -core region A- RMRS 3 ' . In this 
embodiment, for example, RMRS may be a loxP site or a 
FRT site and the recombinase may be Cre and FLP, 
15 respectively. 

In additional embodiments, (i) the second 
recombination site is a pseudo-RMRS site, and the second 
recombination site comprises a first DNA sequence 
(attTS 1 ), a core region B, and a second DNA sequence 
20 (attT3 ' ) in the relative order attT5'-core region B- 
attT3', and (ii) the first recombination site is a 
hybrid- recombination site comprising RMRS5 f -core region 
B-RMRS3 ' or attTS ' -core region B-attT3 1 . 

In yet further embodiments, the site-specific 
25 recombinase is a recombinase encoded by a phage selected 
from the group consisting of <t>C31, TP901-1, and R4 . 
The recombinase may facilitate recombination between a 
bacterial genomic recombination site (attB) and a phage 
genomic recombination site (attP) , and (i) the second 
30 recombination site may comprise a pseudo-attP site, and 
(ii) the first recombination site may comprise the attB 
site or (i) the second recombination site may comprise a 
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pseudo-attB site, and (ii) the first recombination site 
may comprise the attP site. 

In another embodiment, (i) attB comprises a first 
DNA sequence (attB5 ' ) , a bacterial core region, and a 
5 second DNA sequence (attB3 ' ) in the relative order 

attB5 ' -bacterial core region-attB3 1 , (ii) attP comprises 
a first DNA sequence (attP5'), a phage core region, and 
a second DNA sequence (attP3 ' ) in the relative order 
attP5' -phage core region-attP3 ■ , and (iii) wherein the 

10 recombinase meditates production of recombination- 
product sites that can no longer act as a substrate for 
the recombinase, the recombination-product sites 
comprising the relative order attBS ' -recombination- 
product site-attP3' and attP5 ' -recombination-product 

15 site-attB3 ■ . 

In particularly preferred embodiments, (i) the 
second recombination site is a pseudo-attP site, the 
second recombination site comprises a first DNA sequence 
(attT5'), a core region B, and a second DNA sequence 

20 (attT3') in the relative order attT5'-core region B- 
attT3 • , (ii) the first recombination site is an attB 
site comprising attBS 1 -bacterial core region-attB3 1 , and 
(iii) wherein the recombinase meditates production of 
recombination-product sites that can no longer act as a 

25 substrate for the recombinase, the recombination-product 
sites comprising the relative order attTS 1 - 
recombination-product site-attB3 ' (polynucleotide of 
interest }attB5 ' -recombination-product site-attT3 1 . 
Alternatively, (i) the second recombination site is a 

3 0 pseudo-attB site, and the second recombination site 

comprises a first DNA sequence (attT5') ; a core region 
B, and a second DNA sequence (attT3 ' ) in the relative 
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order attT5'-core region B-attT3 ' , (ii) the first 
recombination site is an attP site comprising attP5 ' - 
bacterial core region-attP3 ' , and (iii) wherein the 
recombinase meditates production of recombination- 
5 product sites that can no longer act as a substrate for 
the recombinase, the recombination-product sites 
comprising the relative order attT5 ' -recombination- 
product site-attP3 1 {polynucleotide of interest }attP5 ' - 
recombination-product site-attT3 ' . 

10 In yet further embodiments, the site-specific 

recombinase is introduced into the cell as a 
polypeptide. In alternative embodiments, the site- 
specific recombinase in introduced into the cell as a 
polynucleotide encoding the recombinase and an 

15 expression cassette, optionally carried on a transient 

expression vector, comprises the polynucleotide encoding 
the recombinase . 

In another embodiment, the invention is directed to 
a vector for site-specific integration of a 

20 polynucleotide sequence into the genome of a eucaryotic 
cell. The vector comprises (i) a circular backbone 
vector, (ii) a polynucleotide of interest operably 
linked to a eucaryotic promoter, and (iii) a first 
recombination site, wherein the genome of the cell 

25 comprises a second recombination site native to the 

genome and recombination between the first and second 
recombination sites is facilitated by a site-specific 
recombinase . 

In certain embodiments, the recombinase normally 
30 facilitates recombination between a bacterial genomic 
recombination site (attB) and a phage genomic 
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recombination site (attP) and the first recombination 
site may be either attB or attP. 

In still another embodiment, the invention is 
directed to a kit for site-specific integration of a 
5 polynucleotide sequence into the genome of a eucaryotic 
cell. The kit comprises, (i) a vector as described 
above and (ii) a site-specific recombinase . 

In another embodiment, the invention is directed to 
a eucaryotic cell having a modified genome. The 

10 modified genome comprises an integrated polynucleotide 
sequence of interest whose integration was mediated by a 
recombinase and wherein the integration was into a 
recombination site native to the eucaryotic cell genome 
and the integration created a recombination-product site 

15 comprising the polynucleotide sequence. 

In certain embodiments, the recombination- site 
product comprises the components at tT5 ' -recombination- 
product site-attB3' and attBS ' -recombination-product 
site-attT3', wherein (i) the native recombination site 

20 is a pseudo-attP site, and the native recombination site 
comprises a first DNA sequence (attTS 1 ), a core region 
B, and a second DNA sequence (attT3') in the relative 
order attT5'-core region B-attT3*, (ii) the integrated 
polynucleotide sequence comprises a first recombination 

25 site comprising an attB site comprising attBS ' -bacterial 
core region-attB3 ' , and (iii) wherein the recombinase 
meditates production of recombination-product sites that 
can no longer act as a substrate for the recombinase, 
the recombination -product sites comprising the relative 

30 order at tT5 1 -recombination-product site- 

attB3 ' (polynucleotide of interest }attB5 ' -recombination- 
product site-attT3'. Alternatively, the recombination- 
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site product comprises the components attT5 1 - 
recombination-product site-attB3 1 and attBS 1 - 
recombination-product site-attT3', wherein (i) the 
native recombination site is a pseudo-attB site, and the 
5 native recombination site comprises a first DNA sequence 
(attTS 1 ), a core region B, and a second DNA sequence 
(attT3 ! ) in the relative order attTS 1 -core region B- 
attT3 ' , (ii) the integrated polynucleotide sequence 
comprises a first recombination site comprising an attP 

10 site comprising attPB ■ -phage core region-attP3 ' , and 
(iii) wherein the recombinase meditates production of 
recombination-product sites that can no longer act as a 
substrate for the recombinase, the recombination-product 
sites comprising the relative order attTS' - 

15 recombination-product site-attP3 • {polynucleotide of 
interest }attP5 ' -recombination-product site-attT3 1 . 

In further embodiments, the subject invention is 
directed to transgenic plants and animals comprising at 
least one cell as described above, as well as methods of 

2 0 producing the same . 

In yet other embodiments, the invention is directed 
to methods of treating a disorder in a subject in need 
of such treatment. The method comprises site- 
specifically integrating a polynucleotide sequence of 

2 5 interest in a genome of at least one cell of the 
subject, wherein the polynucleotide facilitates 
production of a product that treats the disorder in the 
subject. The site-specific integration may be carried 
out in vivo in the subject, or ex vivo in cells and the 

30 cells are then introduced into the subject. 

A further embodiment of the invention comprises 
cells, tissues, transgenic animals and/or plants whose 
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genomes have been modified using the methods described 
herein. 

In another aspect, the present invention provides a 
method of modifying a genome of a cell. In the method, 
5 an attB or an attP recombination site is into the genome 
of a cell, wherein (i) the recombination site is 
recognized by a recombinase, and (ii) the cell normally 
does not comprise the attB or attP site. The vectors 
described herein and above are useful in the practice of 

10 this aspect of the invention. In a preferred 

embodiment, the cell that is being modified is a 
eucaryotic cell. 

In yet another aspect, the present invention 
provides expression cassettes, comprising a 

15 polynucleotide encoding a site-specific recombinase, 
wherein (i) the recombinase is encoded by a phage 
(typically selected from the group consisting of $031, 
TP901-1, and R4) and the recombinase is operably linked 
to a eucaryotic promoter. The vectors described herein 

2 0 and above are useful in the practice of this aspect of 

the invention. 

These and other embodiments of the present 
invention will readily occur to those of ordinary skill 
in the art in view of the disclosure herein. 

25 

Brief Description of the Figures 

Figures 1A through 1C are schematics of 
representative plasmids useful in evaluating the 
efficiency of pseudo-lox recombination sequences. 

3 0 Figure 1A shows an unmodified plasmid containing a gene 

for ampicillin resistance and a gene for P-galactosidase 
expression (lacZ) under control of the CMV promoter 
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(pLCGl) . Figure IB shows the same plasmid with wild- 
type loxP sequences flanking the lacZ gene (pWTLox 2 ) . 
Figure 1C shows the plasmid with the \\flox h7q21 pseudo- 
lox recombination sequence on one side of lacZ and a lox 
5 sequence with wild- type palindromes and a pseudo- lox 
core on the other side (p\j/loxh7q21) . 

Figure ID shows the DNA sequences of the lox sites 
from pWTLox 2 (top line of Figure ID) and plasmid 
p\yloxh7q21 (bottom lines of Figure ID) . 

10 Figure 2 shows the results of an excision assay 

performed in human cells as described in the examples. 
Each of the tested plasmids was transfected into human 
293 cells along with a Cre expression plasmid. After 72 
hours, DNA was transformed into E. coli and recombinants 

15 scored. The transient excision frequency is expressed 
as a percentage, where the value for pWTLox 2 is set at 
100%. 

Figure 3 is a diagram of plasmids used in a 
transient integration assay performed in human cells as 

20 described in the examples. pRh7q21 (upper left) was the 
recipient for an integration event and included the 
chromosomal \\flox h7q21 site (open triangle) , as well as 
the gene for tetracycline resistance. Similar control 
plasmids bearing either no lox site or the wild-type 

25 loxP site were also constructed. pDh7q21 (upper right) 
was the donor plasmid for integration and included a lox 
site (open triangle, loxycore) comprising the 8 -bp core 
from \\flox h7q21 and the wild-type loxP palindromes. The 
plasmid also carried two wild-type loxP sites (dark 

30 triangles) . In the presence of Cre, the plasmid origin 
of replication and the ampicillin resistance gene are 
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excised, resulting in integrants that do not have two 
plasmid origins. This excised by-product is shown in 
the lower right. The site-specific integration product, 
bearing lacZ flanked by hybrid lox sites (shaded 
5 triangles) in a tetracycline resistant backbone, is 

shown at lower left. Parallel donor plasmids having, in 
place of ylox h7q21, either no lox site or only wild- 
type loxP sites, were also constructed. 

Figures 4A through 4E are schematic diagrams of 

10 representative plasmids used in demonstrating function 
of the <J>C31 integrase, as described in the examples. 
Figure 4A shows plasmid pint, for expression of <|>C31 
integrase in E. coli; Figure 4B depicts plasmid pCMVInt 
for expression of integrase in mammalian cells; Figure 

15 4C depicts plasmid pBCPB+, an intramolecular integration 
assay vector; figure 4D shows plasmid p220KattBf ull , an 
EBV vector bearing attB, the target for integration 
events; Figure 4E shows plasmid pTSAD, the donor for 
integration events, bearing attP. Kan R , Amp R , Chlor R and 

20 Hyg* are genes for resistance to kanamycin, ampicillin, 
chloramphenicol, and hygromycin, respectively. 

Figure 5 shows along the vertical axis the percent 
recombination obtained in the intramolecular integration 
assay in E. coli, described in Example 6, when various 

25 shortened versions of <|>C31 attB (left) and attP (right) 
were tested. The name of each site tested corresponds 
to the length of the att site in basepairs . The A and B 
of B33 indicate sites where the reduction of the site 
length from 34 -bp to 33 -bp occurred at the left or right 

30 ends of the site, respectively. Similar nomenclature is 
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used for P39A and P39B. Full refers to the full length 
attB. 

Figure 6 shows the percent recombination obtained 
in the intramolecular integration assay performed in E. 
5 coli when various substitutions in the attB and/or attF 
cores were made. The first column shows the 
recombination frequency when attB bears the mutant 
sequence shown and attP remains wild- type, the second 
column shows the recombination frequency when attD bears 

10 the mutant sequence, while the third column shows the 

recombination frequency when both attB and attP bear the 
mutant core sequence shown. nd = not done. As the 
figure indicates, most changes in the core region are 
not well tolerated. 

15 Figure 7 shows the results of a bimolecular 

integration assay performed in human cells as described 
in the examples. Results are shown for human cells 
carrying three EBV plasmids, p220K, a negative control 
lacking attB; p220KattB35, which carries the minimally 

20 sized attB; and p220KattBfull , carrying the full-sized 

attB. Integration frequencies are shown for experiments 
when no DNA was transfected, when either the integrase 
expression plasmid pCMVInt or the attP-bearing plasmid 
pTSAD alone was transfected, or when both pCMVInt and 

25 pTSAD together were transfected. Only the latter 

conditions, in the presence of a plasmid bearing attB, 
lead to integration events. Integration frequencies 
were corrected for transfection frequency to give the 
accurate corrected integration frequencies in the last 

3 0 column. p22 0KattBf ull produced the highest integration 
frequency at 7.5%. 
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Figures 8A through 8B show pseudo- loxP sequences 
identified by computer search, as described in the 
Examples. The core sequences are shown in boldface 
type. 

5 

Detailed Description of the Invention 

Throughout this application, various publications, 
patents, and published patent applications are referred 
to by an identifying citation. The disclosures of these 
10 publications, patents, and published patent 

specifications referenced in this application more fully 
describe the state of the art to which this invention 
pertains , 

The practice of the present invention will employ, 
15 unless otherwise indicated, conventional techniques of 
molecular biology, microbiology, cell biology and 
recombinant DNA, which are within the skill of the art. 
See, e.g., Sambrook, Fritsch, and Maniatis, MOLECULAR 
CLONING: A LABORATORY MANUAL, 2nd edition (1989); 
2 0 CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, (P.M. Ausubel et 
al. eds., 1987); the series METHODS IN ENZYMOLOGY 
(Academic Press, Inc.); PCR 2: A PRACTICAL APPROACH 
(M.J. McPherson, B.D. Hames and G.R. Taylor eds., 1995) 
and ANIMAL CELL CULTURE (R.I. Freshney. Ed., 1987). 
25 As used in this specification and the appended 

claims, the singular forms "a," "an" and "the" include 
plural references unless the content clearly dictates 
otherwise. Thus, for example, reference to n an antigen" 
includes a mixture of two or more such agents . 

30 
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Definitions 

"Recombinase" as used herein refers to a group of 
enzymes that can facilitate site specific recombination 
between defined sites, where the sites are physically 
5 separated on a single DNA molecule or where the sites 
reside on separate DNA molecules. The DNA sequences of 
the defined recombination sites are not necessarily 
identical. Within this group are several subfamilies 
including "Integrase" (including, for example, Cre and X 
10 integrase) and "Resolvase/Invertase" (including, for 
example, 0C31 integrase, R4 integrase, and TP- 901 
integrase) . 

By "wild-type recombination site (RS/WT) " is meant 
a recombination site normally used by an integrase or 

15 recombinase. For example, X is a temperate 

bacteriophage that infects E. coli. The phage has one 
attachment site for recombination (attP) and the E. coli 
bacterial genome has an attachment site for 
recombination (attB) . Both of these sites are wild-type 

20 recombination sites for X integrase. In the context of 
the present invention, wild-type recombination sites 
occur in the homologous phage/bacteria system. 
Accordingly, wild-type recombination sites can be 
derived from the homologous system and associated with 

25 heterologous sequences, for example, the Att B site can be 
placed in other systems to act as a substrate for the 
integrase . 

By "pseudo- recombination site (RS/P)" is meant a 
site at which recombinase can facilitate recombination 
3 0 even though the site may not have a sequence identical 
to the sequence of its wild-type recombination site. A 
pseudo-recombination site is typically found in an 



15 



WO 00/11155 



PCT/US99/18987 



organism heterologous to the native phage/bacterial 
system. For example, a <J>C31 integrase and vector 
carrying a <\>C31 wild- type recombination site can be 
placed into a eucaryotic cell. The wild-type 
5 recombination sequence aligns itself with a sequence in 
the eucaryotic cell genome and the integrase facilitates 
a recombination event. When the sequence from the 
genomic site, in the eucaryotic cell, where the 
integration of the vector took place (via a 

10 recombination event between the wild- type recombination 
site in the vector and the genome) is examined, the 
sequence at the genomic site typically has some identity 
to but may not be identical with the wild-type bacterial 
genome recombination site. The recombination site in 

15 the eucaryotic cell is considered to be a pseudo- 
recombination site at least because the eucaryotic cell 
is heterologous to the normal phage/bacterial cell 
system. The size of the pseudo- recombination site can 
be determined through the use of a variety of methods 

20 including, but not limited to, (i) sequence alignment 
comparisons, (ii) secondary structural comparisons, 

(iii) deletion or point mutation analysis to find the 
functional limits of the pseudo-recombination site, and 

(iv) combinations of the foregoing. Pseudo- 

25 recombination sites typically occur naturally in the 

genomes of eucaryotic cells (i.e., the sites are native 
to the genome) and are functionally identified as 
described herein (e.g., see Examples). 

By "hybrid-recombination site (RS/H) " as used 

30 herein refers to a recombination site constructed from 

portions of wild-type and/or pseudo-recombination sites. 
As an example, a wild- type recombination site may have a 
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short, core region flanked by palindromes. In one 
embodiment of a "hybrid-recombination site" the short, 
core region sequence of the hybrid-recombination site 
matches a core sequence of a pseudo-recombination site 
5 and the palindromes of the hybrid-recombination site 
match the wild- type recombination site. In an 
alternative embodiment, the hybrid- recombination site 
may be comprised of flanking sites derived from a 
pseudo-recombination site and a core region derived from 

10 a wild-type recombination site. Other combinations of 

such hybrid-recombination sites will be evident to those 
having ordinary skill in the art, in view of the 
teachings of the present specification. 

A recombination site "native" to the genome, as 

15 used herein, means a recombination site that occurs 

naturally in the genome of a cell (i.e., the sites are 
not introduced into the genome, for example, by 
recombinant means . ) 

By "nucleic acid construct" it is meant a nucleic 

2 0 acid sequence that has been constructed to comprise one 

or more functional units not found together in nature. 
Examples include circular, double-stranded, 
extrachromosomal DNA molecules (plasmids) , cosmids 
(plasmids containing COS sequences from lambda phage) , 
25 viral genomes comprising non-native nucleic acid 
sequences, and the like. 

By "nucleic acid fragment of interest" it is meant 
any nucleic acid fragment that one wishes to insert into 
a genome. Suitable examples of nucleic acid fragments 

3 0 of interest include therapeutic genes, marker genes, 

control regions, trait -producing fragments, and the 
like. 
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"Therapeutic genes" are those nucleic acid 
sequences which encode molecules that provide some 
therapeutic benefit to the host, including proteins, 
functional RNAs (antisense, hammerhead ribozymes) , and 
5 the like. One well known example is the cystic fibrosis 
transmembrane conductance regulator {CFTR) gene. The 
primary physiological defect in cystic fibrosis is the 
failure of electrogenic chloride ion secretion across 
the epithelia of many organs, including the lungs. One 

10 of the most dangerous aspects of the disorder is the 
cycle of recurrent airway infections which gradually 
destroy lung function resulting in premature death. 
Cystic fibrosis is caused by a variety of mutations in 
the CFTR gene. Since the problems arising in cystic 

15 fibrosis result from mutations in a single gene, the 
possibility exists that the introduction of a normal 
copy of the gene into the lung epithelia could provide a 
treatment for the disease, or effect a cure if the gene 
transfer was permanent. 

2 0 Other disorders resulting from mutations in a 

single gene (known as monogenic disorders) include 
alpha-l-antitrypsin deficiency, chromic granulomatous 
disease, familial hypercholesterolemia, Fanconi anemia, 
Gaucher disease, Hunter syndrome, ornithine 
25 transcarbamylase deficiency, purine nucleoside 
phosphorylase deficiency, severe combined 
immunodeficiency disease (SCID) -ADA, X-linked SCID, 
hemophilia, and the like. 

Therapeutic benefit in other disorders may also 

3 0 result from the addition of a protein-encoding 

therapeutic nucleic acid. For example, addition of a 
nucleic acid encoding an immunomodulating protein such 
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as interleukin-2 may be of therapeutic benefit for 
patients suffering from different types of cancer. 

A nucleic acid fragment of interest may 
additionally be a "marker nucleic acid" or "marker 
5 polypeptide" . Marker genes encode proteins which can be 
easily detected in transformed cells and are, therefore, 
useful in the study of those cells. Marker genes are 
being used in bone marrow transplantation studies, for 
example, to investigate the biology of marrow 
10 reconstitution and the mechanism of relapse in patients. 
Examples of suitable marker genes include beta-- 
galactosidase, green or yellow fluorescent proteins, 
chloramphenicol acetyl transferase, lucif erase, and the 
like . 

15 A nucleic acid fragment of interest may 

additionally be a control region. The term "control 
region" or "control element" includes all nucleic acid 
components which are operably linked to a DNA fragment 
and involved in the expression of a protein or RNA 

20 therefrom. An operable linkage is a linkage in which the 
regulatory DNA fragments and the DNA sought to be 
expressed are connected in such a way as to permit 
coding sequence (the nucleic acids encoding the amino 
acid sequence of a protein) expression. The precise 

25 nature of the regulatory regions needed for coding 

sequence expression may vary from organism to organism, 
but will in general include a promoter region that, in 
prokaryotes, contains both the promoter (which directs 
the initiation of RNA transcription) as well as the DNA 

30 that, when transcribed into RNA, will signal synthesis 

initiation. Such regions will normally include those 5' 
noncoding sequences involved with initiation of 
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transcription and translation, such as the enhancer, 
TATA box, capping sequence, CAAT sequence, and the like. 

Under some circumstances, the native genome sought 
to be modified contains a functional coding sequence but 
5 lacks the ability to control the expression of the 
sequence. In such cases it would be of benefit to 
modify the genome by the insertion of control region (s) . 
Such sequences include any sequence that functions to 
modulate replication, transcriptional or translational 

10 regulation, and the like. Examples include promoters, 
signal sequences, propeptide sequences, transcription 
terminators, polyadenylation sequences, enhancer 
sequences, attenuatory sequences, intron splice site 
sequences, and the like. 

15 A nucleic acid fragment of interest may 

additionally be a trait -producing sequence, by which it 
is meant a sequence conferring some non-native trait 
upon the organism or cell in which the protein encoded 
by the trait -producing sequence is expressed. The term 

2 0 "non-native" when used in the context of a trait - 

producing sequence means that the trait produced is 
different than one would find in an unmodified organism 
which can mean that the organism produces high amounts 
of a natural substance in comparison to an unmodified 
25 organism, or produces a non-natural substance. For 

example, the genome of a crop plant, such as corn, can 
be modified to produce higher amounts of an essential 
amino acid, thus creating a plant of higher nutritional 
quality, or could be modified to produce proteins not 

3 0 normally produced in plants, such as antibodies. (See 

U.S. Patent No. 5,202,422 (issued April 13, 1993); U.S. 
Patent No. 5,639,947 (June 17, 1997).) Likewise, the 
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genome of industrially important microorganisms can be 
modified to make them more useful such as by inserting 
new metabolic pathways with the aim of producing novel 
metabolites or improving both new and existing processes 
5 such as the production of antibiotics and industrial 
enzymes. Other useful traits include herbicide 
resistance, antibiotic resistance, disease resistance, 
resistance to adverse environmental conditions (e.g., 
temperature, pH, salt, drought), and the like. 

10 Methods of transforming cells are well known in the 

art. By "transformed" it is meant a heritable 
alteration in a cell resulting from the uptake of 
foreign DNA . Suitable methods include viral infection, 
transf ection, conjugation, protoplast fusion, 

15 electroporation, particle gun technology, calcium 

phosphate precipitation, direct microinjection, and the 
like. The choice of method is generally dependent on 
the type of cell being transformed and the circumstances 
under which the transformation is taking place (i.e. in 

20 vitro, ex vivo, or in vivo) . A general discussion of 
these methods can be found in Ausubel , et al , Short 
Protocols in Molecular Biology, 3rd ed. , Wiley & Sons, 
1995. 

The terms "nucleic acid molecule" and 
25 "polynucleotide" are used interchangeably and refer to a 
polymeric form of nucleotides of any length, either 
deoxyribonucleotides or ribonucleotides, or analogs 
thereof. Polynucleotides may have any three-dimensional 
structure, and may perform any function, known or 
30 unknown. Non-limiting examples of polynucleotides 
include a gene, a gene fragment, exons, introns, 
messenger RNA (mRNA) , transfer RNA, ribosomal RNA, 
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ribozymes, cDNA, recombinant polynucleotides, branched 
polynucleotides, plasmids, vectors, isolated DNA of any 
sequence, isolated RNA of any sequence, nucleic acid 
probes, and primers. 
5 A polynucleotide is typically composed of a 

specific sequence of four nucleotide bases: adenine (A); 
cytosine (C) ; guanine (G) / and thymine (T) (uracil (U) 
for thymine (T) when the polynucleotide is RNA) . Thus, 
the term polynucleotide sequence is the alphabetical 

10 representation of a polynucleotide molecule. This 

alphabetical representation can be input into databases 
in a computer having a central processing unit and used 
for bioinformatics applications such as functional 
genomics and homology searching. 

15 A "coding sequence" or a sequence which "encodes" a 

selected polypeptide, is a nucleic acid molecule which 
is transcribed (in the case of DNA) and translated (in 
the case of mRNA) into a polypeptide, for example, in 
vivo when placed under the control of appropriate 

20 regulatory sequences (or "control elements"). The 
boundaries of the coding sequence are typically 
determined by a start codon at the 5' (amino) terminus 
and a translation stop codon at the 3* (carboxy) 
terminus. A coding sequence can include, but is not 

25 limited to, cDNA from viral, procaryotic or eucaryotic 
mRNA, genomic DNA sequences from viral or procaryotic 
DNA, and even synthetic DNA sequences. A transcription 
termination sequence may be located 3 1 to the coding 
sequence. Other "control elements" may also be 

30 associated with a coding sequence. A DNA sequence 

encoding a polypeptide can be optimized for expression 
in a selected cell by using the codons preferred by the 
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selected cell to represent the DNA copy of the desired 
polypeptide coding sequence. "Encoded by" refers to a 
nucleic acid sequence which codes for a polypeptide 
sequence, wherein the polypeptide sequence or a portion 
5 thereof contains an amino acid sequence of at least 3 to 
5 amino acids, more preferably at least 8 to 10 amino 
acids, and even more preferably at least 15 to 2 0 amino 
acids from a polypeptide encoded by the nucleic acid 
sequence. Also encompassed are polypeptide sequences 

10 which are immunologically identifiable with a 
polypeptide encoded by the sequence. 

"Operably linked" refers to an arrangement of 
elements wherein the components so described are 
configured so as to perform their usual function. Thus, 

15 a given promoter that is operably linked to a coding 
sequence (e.g., a reporter expression cassette) is 
capable of effecting the expression of the coding 
sequence when the proper enzymes are present . The 
promoter or other control elements need not be 

2 0 contiguous with the coding sequence, so long as they 

function to direct the expression thereof. For example, 
intervening untranslated yet transcribed sequences can 
be present between the promoter sequence and the coding 
sequence and the promoter sequence can still be 

25 considered "operably linked" to the coding sequence. 
A "vector" is capable of transferring gene 
sequences to target cells. Typically, "vector 
construct," "expression vector," and "gene transfer 
vector, " mean any nucleic acid construct capable of 

30 directing the expression of a gene of interest and which 
can transfer gene sequences to target cells. Thus, the 
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term includes cloning, and expression vehicles, as well 
as integrating vectors. 

An "expression cassette" comprises any nucleic acid 
construct capable of directing the expression of a 
5 gene/coding sequence of interest. Such cassettes can be 
constructed into a "vector," "vector construct," 
"expression vector," or "gene transfer vector," in order 
to transfer the expression cassette into target cells. 
Thus, the term includes cloning and expression vehicles, 

10 as well as viral vectors. 

Techniques for determining nucleic acid and amino 
acid "sequence identity" also are known in the art. 
Typically, such techniques include determining the 
nucleotide sequence of the mRNA for a gene and/or 

15 determining the amino acid sequence encoded thereby, and 
comparing these sequences to a second nucleotide or 
amino acid sequence. In general, "identity" refers to 
an exact nucleotide-to-nucleotide or amino acid-to-amino 
acid correspondence of two polynucleotides or 

2 0 polypeptide sequences, respectively. Two or more 

sequences (polynucleotide or amino acid) can be compared 
by determining their "percent identity." The percent 
identity of two sequences, whether nucleic acid or amino 
acid sequences, is the number of exact matches between 
25 two aligned sequences divided by the length of the 

shorter sequences and multiplied by 100. An approximate 
alignment for nucleic acid sequences is provided by the 
local homology algorithm of Smith and Waterman, Advances 
in Applied Mathematics 2:482-489 (1981). This algorithm 

3 0 can be applied to amino acid sequences by using the 

scoring matrix developed by Dayhoff, Atlas of Protein 
Sequences and Structure . M.O. Dayhoff ed. , 5 suppl . 
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3:353-358, National Biomedical Research Foundation, 
Washington, D.C., USA, and normalized by Gribskov, 
Nucl. Acids Res. 14 (6) : 6745-6763 (1986). An exemplary 
implementation of this algorithm to determine percent 
5 identity of a sequence is provided by the Genetics 

Computer Group (Madison, WI) in the "BestFit" utility 
application. The default parameters for this method are 
described in the Wisconsin Sequence Analysis Package 
Program Manual, Version 8 (1995) (available from 
10 Genetics Computer Group, Madison, WI) . A preferred 

method of establishing percent identity in the context 
of the present invention is to use the MPSRCH package of 
programs copyrighted by the University of Edinburgh, 
developed by John F. Collins and Shane S. Sturrok, and 
15 distributed by IntelliGenetics , Inc. (Mountain View, 
CA) . From this suite of packages the Smith-Waterman 
algorithm can be employed where default parameters are 
used for the scoring table (for example, gap open 
penalty of 12, gap extension penalty of one, and a gap 
2 0 of six) . From the data generated the "Match" value 

reflects "sequence identity." Other suitable programs 
for calculating the percent identity or similarity 
between sequences are generally known in the art, for 
example, another alignment program is BLAST, used with 
25 default parameters. For example, BLASTN and BLASTP can 
be used using the following default parameters: genetic 
code = standard; filter = none; strand = both; cutoff = 
60; expect = 10; Matrix = BLOSUM62 ; Descriptions = 50 
sequences; sort by = HIGH SCORE; Databases = non- 
30 redundant, GenBank + EMBL + DDBJ + PDB + GenBank CDS 

translations + Swiss protein + Spupdate + PIR. Details 
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of these programs can be found at the following internet 
address : http: //www.ncbi .nlm.gov/cgi-bin/BLAST. 

Alternatively, homology can be determined by 
hybridization of polynucleotides under conditions that 
5 form stable duplexes between homologous regions, 

followed by digestion with single-stranded-specific 
nuclease (s), and size determination of the digested 
fragments. Two DNA, or two polypeptide sequences are 
"substantially homologous " to each other when the 

10 sequences exhibit at least about 80%-85%, preferably at 
least about 85%-90% / more preferably at least about 90%- 
95%, and most preferably at least about 95%-98% sequence 
identity over a defined length of the molecules, as 
determined using the methods above. As used herein, 

15 substantially homologous also refers to sequences 
showing complete identity to the specified DNA or 
polypeptide sequence. DNA sequences that are 
substantially homologous can be identified in a Southern 
hybridization experiment under, for example, stringent 

2 0 conditions, as defined for that particular system. 

Defining appropriate hybridization conditions is within 
the skill of the art. See, e.g., Sambrook et al . , 
supra; DNA Cloning, supra; Nucleic Acid Hybridization, 
supra . 

25 Two nucleic acid fragments are considered to 

"selectively hybridize" as described herein. The degree 
of sequence identity between two nucleic acid molecules 
affects the efficiency and strength of hybridization 
events between such molecules. A partially identical 

30 nucleic acid sequence will at least partially inhibit a 
completely identical sequence from hybridizing to a 
target molecule. Inhibition of hybridization of the 
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completely identical sequence can be assessed using 
hybridization assays that are well known in the art 
{e.g., Southern blot, Northern blot, solution 
hybridization, or the like, see Sambrook, et al., 
5 Molecular Cloning: A Laboratory Manual, Second Edition, 
(1989) Cold Spring Harbor, N.Y.). Such assays can be 
conducted using varying degrees of selectivity, for 
example, using conditions varying from low to high 
stringency. If conditions of low stringency are 
10 employed, the absence of non-specific binding can be 
assessed using a secondary probe that lacks even a 
partial degree of sequence identity (for example, a 
probe having less than about 30% sequence identity with 
the target molecule) , such that, in the absence of non- 
15 specific binding events, the secondary probe will not 
hybridize to the target. 

When utilizing a hybridization-based detection 
system, a nucleic acid probe is chosen that is 
complementary to a target nucleic acid sequence, and 
2 0 then by selection of appropriate conditions the probe 
and the target sequence "selectively hybridize," or 
bind, to each other to form a hybrid molecule. A nucleic 
acid molecule that is capable of hybridizing selectively 
to a target sequence under "moderately stringent" 
25 typically hybridizes under conditions that allow 

detection of a target nucleic acid sequence of at least 
about 10-14 nucleotides in length having at least 
approximately 70% sequence identity with the sequence of 
the selected nucleic acid probe. Stringent 
30 hybridization conditions typically allow detection of 
target nucleic acid sequences of at least about 10-14 
nucleotides in length having a sequence identity of 
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greater than about 90-95% with the sequence of the 
selected nucleic acid probe. Hybridization conditions 
useful for probe/ target hybridization where the probe 
and target have a specific degree of sequence identity, 
5 can be determined as is known in the art (see, for 
example, Nucleic Acid Hybridization: A Practical 
Approach , editors B.D. Hames and S.J. Higgins, (1985) 
Oxford; Washington, DC; IRL Press) . 

With respect to stringency conditions for 

10 hybridization, it is well known in the art that numerous 
equivalent conditions can be employed to establish a 
particular stringency by varying, for example, the 
following factors: the length and nature of probe and 
target sequences, base composition of the various 

15 sequences, concentrations of salts and other 

hybridization solution components, the presence or 
absence of blocking agents in the hybridization 
solutions (e.g., formamide, dextran sulfate, and 
polyethylene glycol) , hybridization reaction temperature 

20 and time parameters, as well as, varying wash 

conditions. The selection of a particular set of 
hybridization conditions is selected following standard 
methods in the art (see, for example, Sambrook, et al . , 
Molecul ar Clonincr : A Laboratory Manual . Second Edition, 

25 (1989) Cold Spring Harbor, N.Y.) 

A first polynucleotide is "derived from" second 
polynucleotide if it has the same or substantially the 
same basepair sequence as a region of the second 
polynucleotide, its cDNA, complements thereof, or if it 

30 displays sequence identity as described above. 

A first polypeptide is "derived from" a second 
polypeptide if it is (i) encoded by a first 
polynucleotide derived from a second polynucleotide, or 
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(ii) displays sequence identity to the second 
polypeptides as described above. In the present 
invention, when a recombinase is "derived from a phage" 
the recombinase need not be explicitly produced by the 
5 phage itself, the phage is simply considered to be the 
original source of the recombinase and coding sequences 
thereof. Recombinases can, for example, be produced 
recombinantly or synthetically, by methods known in the 
art, or alternatively, recombinases may be purified from 

10 phage infected bacterial cultures. 

"Substantially purified" general refers to 
isolation of a substance (compound, polynucleotide, 
protein, polypeptide, polypeptide composition) such that 
the substance comprises the majority percent of the 

15 sample in which it resides. Typically in a sample a 
substantially purified component comprises 50%, 
preferably 80%-85%, more preferably 90-95% of the 
sample. Techniques for purifying polynucleotides and 
polypeptides of interest are well-known in the art and 

2 0 include, for example, ion -exchange chromatography, 

affinity chromatography and sedimentation according to 
density. 

1.0.0 The Invention 

25 The invention disclosed herein comprises a method 

of specifically modifying a genome. In one embodiment 
of the method, a cell having a target recombination 
sequence (designated attT) is transformed with a nucleic 
acid construct (a "targeting construct") comprising a 

3 0 second recombination sequence (designated attD) and one 

or more polynucleotides of interest. Into the same cell 
a recombinase is introduced that specifically recognizes 
the recombination sequences under conditions such that 



29 



WO 00/11155 



PCT/US99/I8987 



the nucleic acid sequence of interest is inserted into 
the genome via a recombination event between attT and 
attD. Alternatively, the recombinase can be introduced 
into the cell prior to or concurrent with introduction 
5 of the targeting construct transformation with the 
nucleic acid construct. 

The method of the invention is based, in part, on 
the discovery that there exist in various genomes 
specific nucleic acid sequences, herein called pseudo- 

10 recombination sequences, that may be distinct from wild- 
type recombination sequences and that can be recognized 
by a site-specific recombinase and used to promote the 
insertion of heterologous genes or polynucleotides into 
the genome. The inventors have identified such pseudo- 

15 recombination sequences in a variety of organisms, 
including mammals and plants. 

1.1.0 Recombinases 

Two major families of site-specific recombinases 
2 0 from bacteria and unicellular yeasts have been 

described: the integrase family includes Cre, Flp, R, 
and A integrase (Argos, et al . , EMBO J. 5:433-440, 1986) 
and the resolvase/invertase family includes some phage 
integrases, such as, those of phages <J)C31, R4, and TP- 
25 901 (Hallet and Sherratt, FEMS Microbiol. Rev. 21:157- 
178, 1997) . While not wishing to be bound by 
descriptions of mechanisms, strand exchange catalyzed by 
site specific recombinases typically occurs in two steps 
of (1) cleavage and (2) rejoining involving a covalent 
30 protein-DNA intermediate formed between the recombinase 
enzyme and the DNA strand (s). 
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The nature of the catalytic amino acid residue of 
the recombinase enzyme and the line of entry of the 
nucleophile can be different for the two recombinase 
families. For cleavage catalyzed by the 
5 invertase/resolvase family, for example, the nucleophile 
hydroxy 1 is derived from a serine and the leaving group 
is the 3* -OH of the deoxyribose. For the integrase 
family, the catalytic residue is, for example, a 
tyrosine and the leaving group is the 5' -OH. In both 

10 recombinase families, the rejoining step is the reverse 
of the cleavage step. Recombinases particularly useful 
in the practice of the invention are those that function 
in a wide variety of cell types, in part because they do 
not require any host specific factors. Suitable 

15 recombinases include Cre, Flp, R, and the integrases of 
phages $C31, TP901-1, R4, and the like. Some 
characteristics of the two recombinase families are 
discussed below. 

20 1.1.1 Cre-like Recombinases 

The recombinase activity of Cre has been studied as 
a model system for the integrases. Cre is a 38 kD 
protein isolated from bacteriophage PI. It catalyzes 
recombination at a 34 basepair stretch of DNA called 

25 loxP. The loxP site has the sequence 5 1 -ATAACTTCGTATA 
GCATACAT TATACGAAGTTAT - 3 ' consisting of two thirteen 
basepair palindromic repeats flanking an eight basepair 
core sequence. The repeat sequences act as Cre binding 
sites with the crossover point occurring in the core. 

30 Each repeat appears to bind one protein molecule wherein 
the DNA substrate (one strand) is cleaved and a protein 
DNA intermediate is formed having a 3 1 -phosphotyrosine 
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linkage between Cre and the cleaved DNA strand. 
Crystallography and other studies suggest that four 
proteins and two loxP sites form a synapsed structure in 
which the DNA resembles models of four-way Holliday- 
5 junction intermediates, followed by the exchange of a 
second set of strands to resolve the intermediate into 
recombinant products (see, Guo, et al, Nature 389:40-46, 
1997) . The asymmetry of the core region is responsible 
for directionality of the recombination reaction. If 

10 the two recombination sites are repeated in the same 
orientation, the outcome of strand exchange is 
integration or excision. If the two sites are placed in 
the opposite orientation, the outcome is inversion of 
the sequence between the two sites (Yang and Mizuuchi, 

15 Structure 5:1401-1406, 1997). 

Cre has been shown to be active in a wide variety 
of cellular backgrounds including yeast (Sauer, Mol . 
Cell. Biol. 7:2087-2096, 1987), plants (Albert, et al, 
Plant J. 7:649-659, 1995; Dale and Ow, Gene 91:79-8S, 

20 1990; Odell, et al, Mol. Gen, Genet. 223:369-378, 1990) 
and mammals, including both rodent and human cells (van 
Deursen, et al, Proc . Natl. Acad. Sci. USA 92:7376-7380, 
1995; Agah, et al, J. Clin. Invest. 100:169-179, 1997; 
Baubonis, and Sauer, 21:2025-2029, 1993; Sauer and 

25 Henderson, New Biologist 2:441-449, 1990). As the loxP 
site is known only to occur in the PI phage genome, use 
of the enzyme in other cell types requires the prior 
insertion of a loxP site into the genome, which using 
currently available technologies is generally a low- 

3 0 frequency and random event with all of the drawbacks 
inherent in such a procedure. The loxP site can be 
targeted to a specific location by using homologous 
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recombination, but, again, that process occurs at a very 
low frequency. 

Several studies have suggested the possibility that 
an exact match of the loxP sequence is not required for 
5 Cre-mediated recombination (Sternberg, et al, J. Mol . 
Biol. 150:487-507, 1981; Sauer, J. Mol. Biol. 223:911- 
928, 1992; Sauer, Nucleic Acids Research 24:4608-4613, 
1996) , The efficiency of recombination, however, has 
generally been three to four orders of magnitude less 
10 efficient than wild-type loxP. Sauer attempted to 

identify sequences similar to loxP in the human genome 
without success (Sauer, Nucleic Acids Research 24:4608- 
4613, 1996) . 

Flp, a recombinase of the integrase family with 

15 similar properties to Cre has been identified in strains 
of Saccharomyces cerevisiae that contain 2/i- circle DNA. 
Flp recognizes a DNA sequence consisting of two thirteen 
basepair inverted repeats flanking an eight basepair 
core sequence (5 * -GAAGTTCCTATAC TTCTAGAA GAATAGGAACTTC- 

20 3') called FRT. A third repeat follows at the 3' end in 
the natural sequence but does not appear to be required 
for recombinase activity. Like Cre, Flp is functional 
in a wide variety of systems including bacteria (Huang, 
et al, J Bacteriology 179:6076-6083, 1997), insects 

25 (Golic and Lindquist, Cell 59:499-509, 1989; Golic and 

Golic, Genetics 144:1693-1711, 1996), plants (Lyznik, et 
al, Nucleic Acids Res 21:969-975, 1993) and mammals. 
These studies have likewise required that a FRT sequence 
be inserted into the genome to be modified. 

30 A related recombinase, known as R, is encoded by 

the pSRl plasmid of the yeast Zygosaccharomyces rouxii 
(Araki, et al . , J. Mol. Biol. 182:191-203, 1985). This 
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recombinase may have properties similar to those 
described above. 

In the context of the present invention, when a 
recombinase normally facilitates recombination between 
5 two recombination sites and the sites are essentially 
the same (e.g., loxP and Cre) , the sites are designated 
recombinase-mediated-recombination sites (RMRS) . 

1*1.2 Resolvase/Integrase Recombinase s 

10 Unlike the Cre/X integrase family of recombinases, 

members of the resolvase subfamily of recombinase 
enzymes typically contain an N-terminal catalytic domain 
having a high degree (>35%) of sequence homology among 
the subfamily members (Crellin and Rood, J Bacteriology 

15 179 (16) : 5148-5156, 1997; Christiansen, et al, J. 

Bacteriology 178 (17) : 5164-5S173 , 1996). Like some of 
the Cre -type recombinases, however, some resol vases do 
not require host specific accessory factors (Thorpe and 
Smith, PNAS USA 95:5505-5510, 1998). 

20 The process of strand exchange used by the 

resolvases is somewhat different than the process used 
by Cre. This process is described but is not intended 
to be limiting. The resolvases usually make cuts close 
to the center of the crossover site, and the top and 

2 5 bottom strand cuts are often staggered by 2 basepairs, 

leaving recessed 5* ends. A protein-DNA linkage is 
formed between phosphodiester from the 5 ' DNA end and a 
conserved serine residue close to the amino terminus of 
the recombinase. As with the Cre -like invertases, two 

3 0 protein units are bound at each crossover site, however, 

no equivalent to the Holiday junction intermediate is 
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formed (see Stark, et al, Trends in Genetics 8 (12) :432- 
439, 1992). 

The nucleic acid sequences recognized as 
recombination sites by a subset of the resolvase family, 
5 including some phage integrases, differ in several ways 
from the recombination site recognized by Cre. The 
sites used for recognition and recombination of the 
phage and bacterial DNAs (the native host system) are 
generally non- identical, although they typically have a 

10 common core region of nucleic acids. The bacterial 
sequence is generally called the attS sequence 
(bacterial attachment) and the phage sequence is called 
the attP sequence (phage attachment) . Because they are 
different sequences, recombination will result in a 

15 stretch of nucleic acids (called attL or attR for left 
and right) that is neither an attB sequence or an attP 
sequence, and is probably functionally unrecognizable as 
a recombination site to the relevant enzyme, thus 
removing the possibility that the enzyme will catalyze a 

20 second recombination reaction that would reverse the 
first. 

The individual resolvases and the nucleic acid 
sequences that they recognize have been less well 
characterized than Cre and Flp, although many of the 

25 core sequences have been identified. The core sequences 
of some of the resolvases useful in the practice of the 
invention can include, without limitation, the following 
sequences: <J)C31 - 5'-TTG; TP901-1 - S'-TCAAT; and R4 - 
5 1 -GAAGCAGTGGTA. (See Rausch and Lehmann, NAR 19:5187- 

30 5189, 1991; Shirai, et al, J Bacteriology 173 (13) :4237- 
4239, 1991; Crellin and Rood, J Bacteriology 179:5148- 
5156, 1997; Christiansen, et al, J. Bacteriology 
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176:1069-1076, 1994; Brondsted and Hammer, Applied & 
Environmental Microbiology 65:752-758, 1999) 

Several authors have suggested that integrase or 
resolvase (for example, <pC31 integrase) can be used to 
5 modify bacterial genomes, such as, those of E. coli and 
actinomycetes (Mascarenhas and Olson, US Patent No. 
5,470,727; Cox, et al, US Patent No. 5,190,871). 
However, there has been no suggestion that these enzymes 
would be useful in the modification of non-bacterial 
10 genomes. 

1.1.3 Recombination Sites 

The inventors have discovered native recombination 
sites existing in the genomes of a variety of organisms, 

15 where the native recombination site does not necessarily 
have a nucleotide sequence identical to the wild-type 
recombination sequences (for a given recombinase) ; but 
such native recombination sites are nonetheless 
sufficient to promote recombination meditated by the 

20 recombinase. Such recombination site sequences are 

referred to herein as "pseudo- recombination sequences." 
For a given recombinase, a pseudo-recombination sequence 
is functionally equivalent to a wild-type recombination 
sequence, occurs in an organism other than that in which 

25 the recombinase is found in nature, and may have 
sequence variation relative to the wild type 
recombination sequences. 

In the practice of the present invention, wild-type 
recombination sites, pseudo-recombination sites, and 

3 0 hybrid -recombination sites can be used in a variety of 
ways in the construction of targeting vectors. 
Following here are non- limiting examples of how these 
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sites may be employed in the practice of the present 
invention. 

Identification of pseudo-recombination sequences 
can be accomplished, for example, by using sequence 
5 alignment and analysis, where the query sequence is the 
recombination site of interest (for example, a 
recombinase-mediated-recombination site (RMRS ; e.g., 
loxP) , or either attB and/or attP of a phage/bacterial 
system) . Following here are some examples : if a 

10 genomic recombination site (generally designated attT) 

is identified using attB, then that attT site is said to 
be a pseudo-attB site; if a genomic recombination site 
is identified using attP, then that attT site is said to 
be a pseudo-attP site; and, if a genomic recombination 

15 site is identified using an RMRS (e.g., loxP) , then that 
attT site is said to be a pseudo-RMRS site (e.g., 
pseudo-loxP) . 

In one aspect of the present invention, the 
recombinase (for example, Cre) recognizes a 

2 0 recombination site having the following structure: 

flanking sequence palindrome core sequence 
flanking sequence palindrome. Such recombination sites 
typically comprise two approximately 10 - 2 0 base pair 
stretches having some palindromic character which flank 
25 an approximately 3-15 base pair core sequence. 

In this aspect of the present invention, the genome 
of a target cell is searched for sequences having 
sequence identity to the selected recombination site for 
a given recombinase, for example, loxP (Example 1; 

3 0 Figure 8) . The cellular target recombination site 

(attT: in this example, a pseudo-loxP site) accordingly 
has a defined sequence. To practice the genome 
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modification method of the present invention, a 
recombination sequence is placed in the targeting 
vector. This recombination sequence, attD, can take 
many forms but must be capable of participating in site 
5 specific recombination with the genomic site (attT) 

where the recombination is mediated by the appropriate 
recombinase. In this regard, non-limiting examples of 
attD sites include, but are not limited to, the 
following: attD core sequence matches the pseudo- 

10 recombination site core sequence, flanking sequences in 
the targeting construct are wild-type recombination 
sequences (this construct represents a hybrid- 
recombination site) ; or, attD core sequence matches the 
pseudo- recombination site core sequence, flanking 

15 sequences in the targeting construct match the pseudo- 
recombination site flanking sequences. Further, the 
core sequences between attT and attD are generally 
essentially the same and the flanking sequences for attD 
may be combinations of flanking sequences from wild-type 

20 and pseudo -recombination site sources. 

The recombinase -mediated- recombination site (RMRS) 
of this type of recombinase, for example, Cre and Cre- 
like recombinases, can have the following structure: a 
first DNA sequence (RMRS5 1 ) , a core region A, and a 

25 second DNA sequence (RMRS3 ' ) in the relative order 

RMRS5'-core region A-RMRS3 ' . Such recombination sites 
typically comprise two approximately 10 - 20 base pair 
regions having palindromic characteristics (e.g., RMRS 5 ' 
and RMRS3 1 ) which flank an approximately 3-15 basepair 

3 0 core sequence (for example, core region A) . In one 
embodiment, e.g., when employing Cre, hybrid- 
recombination sites may be used where the palindromic 
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sequences are derived from a wild- type recombination 
site and the core sequence is derived from a pseudo- 
recombination site. 

Without being bound to any particular theory or 
5 mechanism of action, when such a nucleic acid construct 
is provided to a cell along with a site-specific 
recombinase, it is possible that the recombinase 
recognizes and binds to the flanking sequences of both 
hybrid- recombination sequence and the pseudo- 

10 recombination sequence from which the basepair core 

sequence was derived, and catalyzes the recombination 
between the two. 

In one embodiment the attD (in the targeting 
construct) is a hybrid-lox sequence comprising two wild- 

15 type thirteen basepair loxP palindromes flanking a 
heterologous core sequence, where the core sequence 
corresponds to the core sequence of the pseudo- 
recombination sequence of attT (in the cell target) . In 
a second embodiment the attD (in the targeting 

20 construct) is a hybrid-FRT sequence comprising two or 

three wild- type thirteen basepair palindromes flanking a 
heterologous core sequence, where the core sequences 
correspond to the core sequence of the pseudo- 
recombination sequence of attT (in the cell target) . 

25 Example 2 describes methods for testing whether a 

putative recombination site is functional as a pseudo- 
recombination site for recombination mediated by the 
selected site specific recombinase and also methods for 
assessing the efficiency of recombination. 

3 0 In a second aspect of the present invention, the 

recombinase (for example, <(>C31) recognizes a 
recombination site where sequence of the 5 ' region of 
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the recombination site can differ from the sequence of 
the 3* region of the recombination sequence. For 
example, for the phage $C31 attP (the phage attachment 
site), the core region is 5 ' -TTG-3 1 the flanking 
5 sequences on either side are represented here as attP5 1 
and attP3 ' , the structure of the attP recombination site 
is, accordingly, attP5 ' -TTG-attP3 ' . Correspondingly, 
for the native bacterial genomic target site (attB) the 
core region is 5 1 -TTG-3 1 , and the flanking sequences on 

10 either side are represented here as attB5 ' and attB3 1 , 
the structure of the attB recombination site is, 
accordingly, attB5 ' -TTG-attB3 ' . After a single-site, 
<f>C31 integrase mediated, recombination event takes place 
the result is the following recombination product: 

15 attBS 1 -TTG-attP3 1 {4>C31 vector sequences } at t P5 ' -TTG- 
attB3 1 . Typically, after recombination the post- 
recombination recombination sites are no longer able to 
act as substrate for the $C31 recombinase. This results 
in stable integration with little or no recombinase 

20 mediated excision. These structures are represented in 
a more generic way as follows: circular targeting 
vector comprising the recombination site (attD) and a 
polynucleotide of interest -- attD5 1 -core-attD3 ' ; 
pseudo-recombination site (attT) -- attT5 1 -core-attT3 ' / 

25 post recombination structure attT5 ' -recombination 
product site (e.g. , core) -attD3 ' {polynucleotide 
sequences of interest }attD5' -recombination product site 
(e.g., core) -attT3 ' . The recombination product site 
sequence can comprise a core identical to the original 

30 core sequence. However, the complete post- 
recombination, recombination sites (for example, attT5 ' - 
recombination product site (e.g., core) -attD3 1 ) 
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generally no longer provide a usable substrate for the 
recombinase. 

In this aspect, when selecting pseudo- recombination 
sites in a target cell (attT) , the genomic sequences of 
5 the target cell can be searched for suitable pseudo- 
recombination sites using either the attP or attB 
sequences associated with a particular recombinase. 
Functional sizes and the amount of heterogeneity that 
can be tolerated in these recombination sequences can be 
10 evaluated, for example, as described in Examples 8 and 
9. 

When a pseudo- recombination site is identified 
using either attP or attB search sequences, the other 
recombination site can be used in the targeting 

15 construct. For example, if attP for a selected 

recombinase is used to identify a pseudo-recombination 
site in the target cell genome, then the wild- type attB 
sequence can be used in the targeting construct . In an 
alternative example, if attB for a selected recombinase 

20 is used to identify a pseudo -recombination site in the 

target cell genome, then the wild- type attP sequence can 
be used in the targeting construct. 

The targeting constructs contemplated by the 
invention may contain additional nucleic acid fragments 

25 such as control sequences, marker sequences, selection 
sequences and the like as discussed below. 

1,2.0 Targeting Constructs and Methods of the 

Present Invention 

3 0 The present invention also provides means for 

targeted insertion of a polynucleotide (or nucleic acid 
sequence(s)) of interest into a genome by, for example, 
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(i) providing a recombinase, wherein the recombinase is 
capable of facilitating recombination between a first 
recombination site and a second recombination site, (ii) 
providing a targeting construct having a first 
5 recombination sequence and a polynucleotide of interest, 
(iii) introducing the recombinase and the targeting 
construct into a cell which contains in its nucleic acid 
the second recombination site, wherein said introducing 
is done under conditions that allow the recombinase to 

10 facilitate a recombination event between the first and 
second recombination sites. 

Historically, the attachment site in a bacterial 
genome is designated "attB" and in a corresponding 
bacteriophage the site is designated "attP" . A 

15 recombination site in a cell of interest is designated 
herein as M attT" . A recombination site in a targeting 
vector is referred to herein as M attD" . 

In one aspect of the present invention, at least 
one pseudo-recombination site for a selected recombinase 

20 is identified in a target cell of interest (attT) . 
These sites can be identified by several methods 
including searching all known sequences derived from the 
cell of interest against a wild-type recombination site 
(e.g., attB or attP) for a selected recombinase (e.g., 

25 as described in Example 1) . The functionality of 

pseudo-recombination sites identified in this way can 
then be empirically evaluated following the teachings of 
the present specification to determine their ability to 
participate in a recombinase -mediated recombination 

3 0 event . 
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1 # 2.1 Targ ting Constructs of the Present Invention 

A targeting construct, to direct integration to 
this pseudo-recombination site, would then comprise a 
recombination site (attD) wherein the recombinase can 
5 facilitate a recombination event between attT and attD, 
and a polynucleotide of interest. Polynucleotides of 
interest can include, but are not limited to, expression 
cassettes encoding polypeptide products. The targeting 
constructs are typically circular and may also contain 

10 selectable markers, an origin of replication, and other 
elements. Targeting constructs of the present invention 
are typically circular. 

A variety of expression vectors are suitable for 
use in the practice of the present invention, both for 

15 prokaryotic expression and eukaryotic expression. In 
general, the targeting construct will have one or more 
of the following features: a promoter, promoter-enhancer 
sequences, a selection marker sequence, an origin of 
replication, an inducible element sequence, an epitope- - 

20 tag sequence, and the like. 

Promoter and promoter- enhancer sequences are 
DNA sequences to which RNA polymerase binds and 
initiates transcription. The promoter determines the 
polarity of the transcript by specifying which strand 

25 will be transcribed. Bacterial promoters consist of 

consensus sequences, -35 and -10 nucleotides relative to 
the transcriptional start, which are bound by a specific 
sigma factor and RNA polymerase. Eukaryotic promoters 
are more complex. Most promoters utilized in expression 

30 vectors are transcribed by RNA polymerase II. General 
transcription factors (GTFS) first bind specific 
sequences near the start and then recruit the binding of 
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RNA polymerase II. In addition to these minimal 
promoter elements, small sequence elements are 
recognized specifically by modular DNA-binding/trans- 
activating proteins (e.g. AP-1, SP-1) that regulate the 
5 activity of a given promoter. Viral promoters serve the 
same function as bacterial or eukaryotic promoters and 
either provide a specific RNA polymerase in trans 
(bacteriophage T7) or recruit cellular factors and RNA 
polymerase (SV40, RSV, CMV) . Viral promoters may be 
10 preferred as they are generally particularly strong 
promoters . 

Promoters may be, furthermore, either 
constitutive or regulatable (i.e., inducible or 
derepressible) . Inducible elements are DNA sequence 

15 elements which act in conjunction with promoters and 
bind either repressors (e.g. lacO/LAC Iq repressor 
system in E. coli) or inducers (e.g. gall/GAIj4 inducer 
system in yeast) . In either case, transcription is 
virtually "shut off" until the promoter is derepressed 

20 or induced, at which point transcription is "turned-on." 

Examples of constitutive promoters include the 
int promoter of bacteriophage A, the bla promoter of the 
3 -lactamase gene sequence of pBR322, the CAT promoter of 
the chloramphenicol acetyl transferase gene sequence of 

25 pPR325, and the like. Examples of inducible prokaryotic 
promoters include the major right and left promoters of 
bacteriophage (P L and P R ) , the trp, reca, lacZ, AraC and 
gal promoters of E. coli, the a-amylase (Ulmanen Ett 
at., J. Bacteriol. 162:176-182, 1985) and the sigma-28- 

30 specific promoters of J3. subtilis (Gilman et al . , Gene 
sequence 32:11-20(1984)), the promoters of the 
bacteriophages of Bacillus (Gryczan, In: The Molecular 
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Biology of the Bacilli, Academic Press, Inc., NY 
(1982)), Strep tomyces promoters (Ward et at., Mol . Gen. 
Genet. 203:468-478, 1986), and the like. Exemplary 
prokaryotic promoters are reviewed by Glick (J. Ind. 
5 Microtiot. 1:277-282, 1987); Cenatiempo (Biochimie 
68:505-516, 1986); and Gottesman (Ann. Rev. Genet. 
18:415-442, 1984) . 

Preferred eukaryotic promoters include, but 
are not limited to, the following: the promoter of the 

10 mouse metallothionein I gene sequence (Hamer et al . , J. 
Mol. Appl. Gen. 1:273-288, 1982); the TK promoter of 
Herpes virus (McKnight, Cell 31:355-365, 1982); the SV40 
early promoter (Benoist et al., Nature (London) 290:304- 
310, 1981) ; the yeast gall gene sequence promoter 

15 (Johnston et al . , Proc. Natl. Acad. Sci . (USA) 79:6971- 
6975, 1982); Silver et al., Proc. Natl. Acad. Sci. (USA) 
81 : 5951-59SS, 1984), the CMV promoter, the EF-1 
promoter, Ecdysone-responsive promoter (s), tetracycline- 
responsive promoter, and the like. 

20 Exemplary promoters for use in the present 

invention are selected such that they are functional in 
cell type (and/or animal or plant) into which they are 
being introduced. 

Selection markers are valuable elements in 

2 5 expression vectors as they provide a means to select for 
growth of only those cells that contain a vector. Such 
markers are of two types: drug resistance and 
auxotrophic. A drug resistance marker enables cells to 
detoxify an exogenously added drug that would otherwise 

30 kill the cell. Auxotrophic markers allow cells to 
synthesize an essential component (usually an amino 



45 



WO 00/11155 



PCT/US99/18987 



acid) while grown in media that lacks that essential 
component - 

Common selectable marker genes include those for 
resistance to antibiotics such as ampicillin, 
5 tetracycline, kanamycin, bleomycin, streptomycin, 

hygromycin, neomycin, Zeocin™, and the like. Selectable 
auxotrophic genes include, for example, hisD, that 
allows growth in histidine free media in the presence of 
histidinol . 

10 A further element useful in an expression vector is 

an origin of replication. Replication origins are 
unique DNA segments that contain multiple short repeated 
sequences that are recognized by multimeric origin- 
binding proteins and that play a key role in assembling 

15 DNA replication enzymes at the origin site. Suitable 
origins of replication for use in expression vectors 
employed herein include E. coli oriC, colEl plasmid 
origin, 2/x and ARS (both useful in yeast systems), sfl, 
SV40, EBV oriP (useful in mammalian systems), and the 

20 like. 

Epitope tags are short peptide sequences that are 
recognized by epitope specific antibodies. A fusion 
protein comprising a recombinant protein and an epitope 
tag can be simply and easily purified using an antibody 

25 bound to a chromatography resin. The presence of the 
epitope tag furthermore allows the recombinant protein 
to be detected in subsequent assays, such as Western 
blots, without having to produce an antibody specific 
for the recombinant protein itself. Examples of 

3 0 commonly used epitope tags include V5, glutathione-S- 
transferase (GST) , hemaglutinin (HA) , the peptide Phe- 
His-His-Thr-Thr, chitin binding domain, and the like. 
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A further useful element in an expression vector is 
a multiple cloning site or polylinker. Synthetic DNA 
encoding a series of restriction endonuclease 
recognition sites is inserted into a plasmid vector, for 
5 example, downstream of the promoter element. These 

sites are engineered for convenient cloning of DNA into 
the vector at a specific position. 

The foregoing elements can be combined to produce 
expression vectors suitable for use in the methods of 

10 the invention. Those of skill in the art would be able 
to select and combine the elements suitable for use in 
their particular system in view of the teachings of the 
present specification. Suitable prokaryotic vectors 
include plasmids such as those capable of replication in 

15 £. coli (for example, pBR322, ColEl, pSClOl, PACYC 184, 
itVX, pRSET, pBAD (Invitrogen, Carlsbad, CA) and the 
like) . Such plasmids are disclosed by Sambrook (cf . 
"Molecular Cloning: A Laboratory Manual," second 
edition, edited by Sambrook, Fritsch, & Maniatis, Cold 

20 Spring Harbor Laboratory, (1989) ) . Bacillus plasmids 
include pCl94, pC221, pTl2 7, and the like, and are 
disclosed by Gryczan (In: The Molecular Biology of the 
Bacilli, Academic Press, NY (1982), pp. 307-329). 
Suitable Streptomyces plasmids include plilOl (Kendall 

25 et al., J. Bacteriol . 169:4177-4183, 1987), and 

streptomyces bacteriophages such as <J>C31 (Chater et al . , 
In: Sixth International Symposium on Actinomycetales 
Biology, Akademiai Kaido, Budapest, Hungary (1986) , pp. 
45-54) . Pseudomonas plasmids are reviewed by John et 

30 al. (Rev. Infect. Dis. 8:693-704, 1986), and Izaki 
(Jpn. J. Bacteriol. 33:729-742, 1978). 
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Suitable eukaryotic plasmids include, for example, 
BPV, EBV, vaccinia, SV40, 2-micron circle, pcDNA3.1, 
pcDNA3.l/GS, pYES2/GS, pMT, p IND, pIND(Spl), pVgRXR 
(Invitrogen) , and the like, or their derivatives. Such 
5 plasmids are well known in the art (Botstein et al . , 
Miami Wntr. SyTnp. 19:265-274, 1982; Broach, In: "The 
Molecular Biology of the Yeast Saccharomyces : Life Cycle 
and Inheritance" , Cold Spring Harbor Laboratory, Cold 
Spring Harbor, NY, p. 445-470, 1981; Broach, Cell 

10 28:203-204, 1982; Dilon et at., J. Clin. Hematol . 

Oncol .10 :39-48, 1980; Maniatis, In: Cell Biology: A 
Comprehensive Treatise, Vol. 3, Gene Sequence 
Expression, Academic Press, NY, pp. 563-608,1980. 

The targeting cassettes described herein can be 

15 constructed utilizing methodologies known in the art of 
molecular biology (see, for example, Ausubel or 
Maniatis) in view of the teachings of the specification. 
As described above, the targeting constructs are 
assembled by inserting, into a suitable vector backbone, 

2 0 an attD (recombination site) , polynucleotides encoding 
sequences of interest operably linked to a promoter of 
interest; and, optionally a sequence encoding a positive 
selection marker. 

A preferred method of obtaining polynucleotides, 

25 including suitable regulatory sequences (e.g., 

promoters) is PCR. General procedures for PCR are 
taught in MacPherson et al., PCR: A Practical Approach, (IRL 
Press at Oxford University Press, (1991) ) . PCR 
conditions for each application reaction may be 

30 empirically determined. A number of parameters 

influence the success of a reaction. Among these 
parameters are annealing temperature and time, extension 
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time, Mg2 + and ATP concentration, pH, and the relative 
concentration of primers, templates and 
deoxyribonucleotides . After amplification, the 
resulting fragments can be detected by agarose gel 
5 electrophoresis followed by visualization with ethidium 
bromide staining and ultraviolet illumination. 

The expression cassettes, targeting constructs, 
vectors, recombinases and recombinase-coding sequences 
of the present invention can be formulated into kits. 
10 Components of such kits can include, but are not limited 
to, containers, instructions, solutions, buffers, 
disposables, and hardware. 

1.2.2 Introducing Recombinases 

15 in the methods of the invention a site-specific 

recombinase is introduced into a cell whose genome is to 
be modified. Methods of introducing functional proteins 
into cells are well known in the art. Introduction of 
purified recombinase protein ensures a transient 

20 presence of the protein and its function, which is often 
a preferred embodiment. Alternatively, a gene encoding 
the recombinase can be included in an expression vector 
used to transform the cell. It is generally preferred 
that the recombinase be present for only such time as is 

25 necessary for insertion of the nucleic acid fragments 
into the genome being modified. Thus, the lack of 
permanence associated with most expression vectors is 
not expected to be detrimental. 

The recombinases used in the practice of the 

3 0 present invention can be introduced into a target cell 
before, concurrently with, or after the introduction of 
a targeting vector. The recombinase can be directly 
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introduced into a cell as a protein, for example, using 
liposomes, coated particles, or microinjection. 
Alternately, a polynucleotide encoding the recombinase 
can be introduced into the cell using a suitable 
5 expression vector. The targeting vector components 
described above are useful in the construction of 
expression cassettes containing sequences encoding a 
recombinase of interest. Expression of the recombinase 
is typically desired to be transient. Accordingly, 

10 vectors providing transient expression of the 

recombinase are preferred in the practice of the present 
invention. However, expression of the recombinase can 
be regulated in other ways, for example, by placing the 
expression of the recombinase under the control of a 

15 regulatable promoter (i.e., a promoter whose expression 
can be selectively induced or repressed) . 

Sequences encoding recombinases useful in the 
practice of the present invention are known and include, 
but are not limited to, the following: Cre -- 

20 Sternberg, et al . , J. Mol . Biol. 187:197-212; q>C31 
Kuhstoss and Rao, J. Mol. Biol. 222:897-908, 1991; 
TP901-1 -- Christiansen, et al . , J. Bact . 178:5164-5173, 
1996; R4 -- Matsuura, et al . , J. Bact. 178:3374-3376, 
1996. 

25 Recombinases for use in the practice of the present 

invention can be produced recombinantly or purified as 
previously described. Polypeptides having the desired 
recombinase activity can be purified to a desired degree 
of purity by methods known in the art of protein 

30 ammonium sulfate precipitation, purification, including, 
but not limited to, size fractionation, affinity 
chromatography, HPLC, ion exchange chromatography, 
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heparin agarose affinity chromatography (e.g., Thorpe & 
Smith, Proc. Nat. Acad. Sci. 95:5505-5510, 1998.) 

1.2.3 Cells 

5 Cells suitable for modification employing the 

methods of the invention include both prokaryotic cells 
and eukaryotic cells, provided that the cell's genome 
contains a pseudo- recombination sequence. Prokaryotic 
cells are cells that lack a defined nucleus. Examples 

10 of suitable prokaryotic cells include bacterial cells, 
mycoplasmal cells and archaebacterial cells. 
Particularly preferred prokaryotic cells include those 
that are useful either in various types of test systems 
(discussed in greater detail below) or those that have 

15 some industrial utility such as Klebsiella oxytoca 
(ethanol production) , Clostridium acetobutylicum 
(butanol production) , and the like (see Green and 
Bennet, Biotech & Bioengineering 58:215-221, 1998; 
Ingram, et al, Biotech & Bioengineering 58:2 04-206, 

20 1998) . Suitable eukaryotic cells include both animal 
cells (such as from insect, rodent, cow, goat, rabbit, 
sheep, non- human primate, human, and the like) and plant 
cells (such as rice, corn, cotton, tobacco, tomato, 
potato, and the like) . Cell types applicable to 

25 particular purposes are discussed in greater detail 
below. 

Yet another embodiment of the invention comprises 
isolated genetically engineered cells. Suitable cells 
may be prokaryotic or eukaryotic, as discussed above. 
30 The genetically engineered cells of the invention may be 
unicellular organisms or may be derived from 
multicellular organisms. By "isolated" in reference to 
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genetically engineered cells derived from multicellular 
organisms it is meant the cells are outside a living 
body, whether plant or animal, and in an artificial 
environment . The use of the term isolated does not 
5 imply that the genetically engineered cells are the only 
cells present. 

In one embodiment, the genetically engineered cells 
of the invention contain any one of the nucleic acid 
constructs of the invention. In a second embodiment, a 

10 recombinase that specifically recognizes recombination 
sequences is introduced into genetically engineered 
cells containing one of the nucleic acid constructs of 
the invention under conditions such that the nucleic 
acid sequence (s) of interest will be inserted into the 

15 genome. Thus, the genetically engineered cells possess 
a modified genome. Methods of introducing such a 
recombinase are well known in the art and are discussed 
above . 

The genetically engineered cells of the invention 
20 can be employed in a variety of ways. Unicellular 
organisms can be modified to produce commercially 
valuable substances such as recombinant proteins, 
industrial solvents, industrially useful enzymes, and 
the like. Preferred unicellular organisms include fungi 
25 such as yeast (for example, £. pombe, Pichia pastoris, 
S. cerevisiae (such as INVScl) , and the like) 
Aspergillis, and the like, and bacteria such as 
Klebsiella, Streptomyces , and the like. 

Isolated cells from multicellular organisms can be 
30 similarly useful, including insect cells, mammalian 
cells and plant cells. Mammalian cells that may be 
useful include those derived from rodents, primates and 
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the like. They include HeLa cells, cells of fibroblast 
origin such as VERO, 3T3 or CHOK1 , HEK 293 cells or 
cells of lymphoid origin (such as 32D cells) and their 
derivatives. Preferred mammalian host cells include 
5 nonadherent cells such as CHO, 32D, and the like. 

In addition/ plant cells are also available as 
hosts, and control sequences compatible with plant cells 
are available, such as the cauliflower mosaic virus 35S 
and 19S, nopaline synthase promoter and polyadenylation 

10 signal sequences, and the like. Appropriate transgenic 
plant cells can be used to produce transgenic plants. 

Another preferred host is an insect cell, for 
example from the Drosophila larvae. Using insect cells 
as hosts, the Drosophila alcohol dehydrogenase promoter 

15 can be used (Rubin, Science 240:1453-1459, 1988). 

Alternatively, baculovirus vectors can be engineered to 
express large amounts of peptide encoded by a desired 
nucleic acid sequence in insect cells (Jasny, Science 
238:1653, 1987); Miller et al . , In: Genetic Engineering 

20 (1986), Setlow, J.K., et al . , eds . , Plenum, Vol. 8, pp. 
277-297) . 

The genetically engineered cells of the invention 
are additionally useful as tools to screen for 
substances capable of modulating the activity of a 

2 5 protein encoded by a nucleic acid fragment of interest. 

Thus, an additional embodiment of the invention 
comprises methods of screening comprising contacting 
genetically engineered cells of the invention with a 
test substance and monitoring the cells for a change in 

3 0 cell phenotype, cell proliferation, cell 

differentiation, enzymatic activity of the protein or 
the interaction between the protein and a natural 
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binding partner of the protein when compared to test 
cells not contacted with the test substance. 

A variety of test substances can be evaluated using 
the genetically engineered cells of the invention 
5 including peptides, proteins, antibodies, low molecular 
weight organic compounds, natural products derived from, 
for example, fungal or plant cells, and the like. By 
"low molecular weight organic compound" it is, meant a 
chemical species with a molecular weight of generally 

10 less than 500 - 1000. Sources of test substances are 
well known to those of skill in the art. 

Various assay methods employing cells are also well 
known by those skilled in the art. They include, for 
example, assays for enzymatic activity (Hirth, et al, US 

15 5,763,198, issued 6/9/98), assays for binding of a test 
substance to a protein expressed by the genetically 
engineered cells, assays for transcriptional activation 
of a reporter gene, and the like. 

Cells modified by the methods of the present 

20 invention can be maintained under conditions that, for 
example, (i) keep them alive but do not promote growth, 
(ii) promote growth of the cells, and/or (iii) cause the 
cells to differentiate or dedifferentiate. Cell culture 
conditions are typically permissive for the action of 

25 the recombinase in the cells, although regulation of the 
activity of the recombinase may also be modulated by 
culture conditions (e.g., raising or lowering the 
temperature at which the cells are cultured) . For a 
given cell, cell-type, tissue, or organism, culture 

30 conditions are known in the art. 
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2.0.0 Transgenic Plants and Non-Human Animals 

In another embodiment, the present invention 
comprises transgenic plants and nonhuman transgenic 
animals whose genomes have been modified by employing 
5 the methods and compositions of the invention. 

Transgenic animals may be produced employing the methods 
of the present invention to serve as a model system for 
the study of various disorders and for screening of 
drugs that modulate such disorders. 

10 A "transgenic" plant or animal refers to a 

genetically engineered plant or animal, or offspring of 
genetically engineered plants or animals. A transgenic 
plant or animal usually contains material from at least 
one unrelated organism, such as, from a virus. The term 

15 "animal" as used in the context of transgenic organisms 
means all species except human. It also includes an 
individual animal in all stages of development, 
including embryonic and fetal stages. Farm animals 
(e.g., chickens, pigs, goats, sheep, cows, horses, 

20 rabbits and the like) , rodents (such as mice) , and 

domestic pets (e.g., cats and dogs) are included within 
the scope of the present invention. In a preferred 
embodiment, the animal is a mouse or a rat. 

The term "chimeric" plant or animal is used to refer 

25 to plants or animals in which the heterologous gene is 

found, or in which the heterologous gene is expressed in 
some but not all cells of the plant or animal. 

The term transgenic animal also includes a germ 
cell line transgenic animal. A "germ cell line 

30 transgenic animal" is a transgenic animal in which the 

genetic information provided by the invention method has 
been taken up and incorporated into a germ line cell, 
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therefore conferring the ability to transfer the 
information to offspring. If such offspring, in fact, 
possess some or all of that information, then they, too, 
are transgenic animals. 
5 Methods of generating transgenic plants and animals 

are known in the art and can be used in combination with 
the teachings of the present application. 

In one embodiment, a transgenic animal of the 
present invention is produced by introducing into a 

10 single cell embryo a nucleic acid construct, comprising 
an attD recombination site capable of recombining with 
an attT recombination site found within the genome of 
the organism from which the cell was derived and a 
nucleic acid fragment of interest, in a manner such that 

15 the nucleic acid fragment of interest is stably 

integrated into the DNA of germ line cells of the mature 
animal and is inherited in normal Mendelian fashion. In 
this embodiment, the nucleic acid fragment of interest 
can be any one of the fragment described previously. 

20 Alternatively, the nucleic acid sequence of interest can 
encode an exogenous product that disrupts or interferes 
with expression of an endogenously produced protein of 
interest, yielding a transgenic animals with decreased 
expression of the protein of interest. 

25 A variety of methods are available for the 

production of transgenic animals. A nucleic acid 
construct of the invention can be injected into the 
pronucleus, or cytoplasm, of a fertilized egg before 
fusion of the male and female pronuclei, or injected 

30 into the nucleus of an embryonic cell (e.g., the nucleus 
of a two-cell embryo) following the initiation of cell 
division (Brinster, et al . , Proc . Nat. Acad. Sci. USA 
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82: 4438, 1985). Embryos can be infected with viruses, 
especially retroviruses, modified with an attD 
recombination site and a nucleic acid sequence of 
interest. The cell can further be treated with a site- 
5 specific recombinase as described above to promote 
integration of the nucleic acid sequence of interest 
into the genome. 

By way of example only, to prepare a transgenic 
mouse, female mice are induced to superovulate . After 

10 being allowed to mate, the females are sacrificed by C0 2 
asphyxiation or cervical dislocation and embryos are 
recovered from excised oviducts . Surrounding cumulus 
cells are removed. Pronuclear embryos are then washed 
and stored until the time of injection. Randomly 

15 cycling adult female mice are paired with vasectomized 
males. Recipient females are mated at the same time as 
donor females. Embryos then are transferred surgically. 
The procedure for generating transgenic rats is similar 
to that of mice. See Hammer, et al . , Cell 63:1099-1112, 

2 0 1990) , Rodents suitable for transgenic experiments can 
be obtained from standard commercial sources such as 
Charles River (Wilmington, MA) , Taconic (Germantown, 
NY), Harlan Sprague Dawley (Indianapolis, IN), etc. 
The procedures for manipulation of the rodent 

2 5 embryo and for microinjection of DNA into the pronucleus 
of the zygote are well known to those of ordinary skill 
in the art (Hogan, et al . , supra) . Microinjection 
procedures for fish, amphibian eggs and birds are 
detailed in Houdebine and Chourrout, Experientia 47:897- 

30 905, 1991) . Other procedures for introduction of DNA 
into tissues of animals are described in U.S. Patent 
No., 4,945,050 (Sandford et al . , July 30, 1990). 
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Totipotent or pluripotent stem cells derived from 
the inner cell mass of the embryo and stabilized in 
culture can be manipulated in culture to incorporate 
nucleic acid sequences employing invention methods. A 
5 transgenic animal can be produced from such cells 
through injection into a blastocyst that is then 
implanted into a foster mother and allowed to come to 
term. 

Methods for the culturing of stem cells and the 

10 subsequent production of transgenic animals by the 

introduction of DNA into stem cells using methods such 
as electroporation, calcium phosphate/DNA precipitation, 
microinjection, liposome fusion, retroviral infection, 
and the like are also are well known to those of 

15 ordinary skill in the art. See, for example, 

Teratocarcinomas and Embryonic Stem Cells, A Practical 
Approach, E.J. Robertson, ed. f IRL Press, 1987). 
Reviews of standard laboratory procedures for 
microinjection of heterologous DNAs into mammalian 

20 (mouse, pig, rabbit, sheep, goat, cow) fertilized ova 
include : Hogan et al . , Manipulating the Mouse Embryo 
(Cold Spring Harbor Press 1986); Krimpenfort et al . , 
1991, Bio/Technology 9:86; Palmiter et al . , 1985, Cell 
41:343; Kraemer et al . , Genetic Manipulation of the 

25 Early Mammalian Embrvo (Cold Spring Harbor Laboratory 
Press 1985); Hammer et al . , 1985, Nature, 315:680; 
Purcel et al . , 1986, Science, 244:1281; Wagner et al . , 
U.S. patent No. 5,175,385; Krimpenfort et al . , U.S. 
patent No. 5,175,384. 

3 0 The final phase of the procedure is to inject 

targeted ES cells into blastocysts and to transfer the 
blastocysts into pseudopregnant females. The resulting 
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chimeric animals are bred and the offspring are analyzed 
by Southern blotting to identify individuals that carry 
the transgene. Procedures for the production of non- 
rodent mammals and other animals have been discussed by 
5 others (see Houdebine and Chourrout, supra; Pursel, et 
al., Science 244:1281-1288, 1989; and Simms, et al . , 
Bio/Technology 6:179-183, 1988). Animals carrying the 
transgene can be identified by methods well known in the 
art, e.g., by dot blotting or Southern blotting. 

10 The term transgenic as used herein additionally 

includes any organism whose genome has been altered by 
in vitro manipulation of the early embryo or fertilized 
egg or by any transgenic technology to induce a specific 
gene knockout. The term "gene knockout" as used herein, 

15 refers to the targeted disruption of a gene in vivo with 
loss of function that has been achieved by use of the 
invention vector. In one embodiment, transgenic animals 
having gene knockouts are those in which the target gene 
has been rendered nonfunctional by an insertion targeted 

2 0 to the gene to be rendered non- functional by targeting a 
pseudo-recombination site located within the gene 
sequence . 

3.0.0 Gene Therapy and Disorders 

25 A further embodiment of the invention comprises a 

method of treating a disorder in a subject in need of 
such treatment. In one embodiment of the method, at 
least one cell or cell type (or tissue, etc) of the 
subject has a target recombination sequence (designated 

30 attT) . This cell (s) is transformed with a nucleic acid 
construct (a "targeting construct") comprising a second 
recombination sequence (designated attD) and one or more 
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polynucleotides of interest (typically a therapeutic 
gene) . Into the same cell a recombinase is introduced 
that specifically recognizes the recombination sequences 
under conditions such that the nucleic acid sequence of 
5 interest is inserted into the genome via a recombination 
event between attT and attD. Subjects treatable using 
the methods of the invention include both humans and 
non-human animals. Such methods utilize the targeting 
constructs and recombinases of the present invention. 

10 A variety of disorders may be treated by employing 

the method of the invention including monogenic 
disorders, infectious diseases, acquired disorders, 
cancer, and the like. Exemplary monogenic disorders 
include ADA deficiency, cystic fibrosis, familial - 

15 hypercholesterolemia, hemophilia, chronic ganulomatous 
disease, Duchenne muscular dystrophy, Fanconi anemia, 
sickle-cell anemia, Gaucher' s disease, Hunter syndrome, 
X- linked SCID, and the like. 

Infectious diseases treatable by employing the 

20 methods of the invention include infection with various 
types of virus including human T-cell lymphotropic 
virus, influenza virus, papilloma virus, hepatitis 
virus, herpes virus, Epstein-Bar virus, immunodeficiency 
viruses (HIV, and the like), cytomegalovirus, and the 

25 like. Also included are infections with other 

pathogenic organisms such as Mycobacterium Tuberculosis, 
Mycoplasma pneumoniae, and the like or parasites such as 
Plasmadium falciparum, and the like. 

The term "acquired disorder" as used herein refers 

30 to a noncongenital disorder. Such disorders are 
generally considered more complex than monogenic 
disorders and may result from inappropriate or unwanted 
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activity of one or more genes. Examples of such 
disorders include peripheral artery disease, rheumatoid 
arthritis, coronary artery disease, and the like. 

A particular group of acquired disorders treatable 
5 by employing the methods of the invention include 
various cancers, including both solid tumors and 
hematopoietic cancers such as leukemias and lymphomas. 
Solid tumors that are treatable utilizing the invention 
method include carcinomas, sarcomas, osteomas, 

10 fibrosarcomas, chondrosarcomas, and the like. Specific 
cancers include breast cancer, brain cancer, lung cancer 
(non-small cell and small cell) , colon cancer, 
pancreatic cancer, prostate cancer, gastric cancer, 
bladder cancer, kidney cancer, head and neck cancer, and 

15 the like. 

The suitability of the particular place in the 
genome is dependent in part on the particular disorder 
being treated. For example, if the disorder is a 
monogenic disorder and the desired treatment is the 

20 addition of a therapeutic nucleic acid encoding a non- 
mutated form of the nucleic acid thought to be the 
causative agent of the disorder, a suitable place may be 
a region of the genome that does not encode any known 
protein and which allows for a reasonable expression 

25 level of the added nucleic acid. Methods of identifying 
suitable places in the genome are well known in the art 
and described further in the Examples below. 

The nucleic acid construct useful in this 
embodiment is additionally comprised of one or more 

3 0 nucleic acid fragments of interest. Preferred nucleic 
acid fragments of interest for use in this embodiment 
are therapeutic genes and/or control regions, as 
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previously defined. The choice of nucleic acid sequence 
will depend on the nature of the disorder to be treated. 
For example, a nucleic acid construct intended to treat 
hemophilia B, which is caused by a deficiency of 
5 coagulation factor IX, may comprise a nucleic acid 

fragment encoding functional factor IX. A nucleic acid 
construct intended to treat obstructive peripheral 
artery disease may comprise nucleic acid fragments 
encoding proteins that stimulate the growth of new blood 

10 vessels, such as, for example, vascular endothelial 

growth factor, platelet -derived growth factor, and the 
like. Those of skill in the art would readily recognize 
which nucleic acid fragments of interest would be useful 
in the treatment of a particular disorder. 

15 The nucleic acid construct can be administered to 

the subject being treated using a variety of methods. 
Administration can take place in vivo or ex vivo. By 
u in vivo," it is meant in the living body of an animal. 
By M ex vivo" it is meant that cells or organs are 

20 modified outside of the body, such cells or organs are 
typically returned to a living body. 

Methods for the therapeutic administration of 
nucleic acid constructs are well known in the art. 
Nucleic acid constructs can be delivered with cationic 

25 lipids (Goddard, et al, Gene Therapy, 4:1231-1236, 1997; 
Gorman, et al , Gene Therapy 4:983-992, 1997; Chadwick, 
et al, Gene Therapy 4:937-942, 1997; Gokhale, et al, 
Gene Therapy 4:1289-1299, 1997; Gao, and Huang, Gene 
Therapy 2:710-722, 1995), using viral vectors (Monahan, 

30 et al, Gene Therapy 4:40-49, 1997; Onodera, et al , Blood 
91:30-36, 1998), by uptake of "naked DNA" , and the like. 
Techniques well known in the art for the transfection of 
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cells (see discussion above) can be used for the ex vivo 
administration of nucleic acid constructs. The exact 
formulation, route of administration and dosage can be 
chosen by the individual physician in view of the 
5 patient's condition. (See e.g. Fingl et al . , 1975, in 
"The Pharmacological Basis of Therapeutics" , Ch. 1 pi). 

It should be noted that the attending physician 
would know how to and when to terminate, interrupt, or 
adjust administration due to toxicity, to organ 

10 dysfunction, and the like. Conversely, the attending 
physician would also know how to adjust treatment to 
higher levels if the clinical response were not adequate 
(precluding toxicity) . The magnitude of an administered 
dose in the management of the disorder being treated 

15 will vary with the severity of the condition to be 

treated, with the route of administration, and the like. 
The severity of the condition may, for example, be 
evaluated, in part, by standard prognostic evaluation 
methods. Further, the dose and perhaps dose frequency 

20 will also vary according to the age, body weight, and 
response of the individual patient . 

In general at least 1 - 10% of the cells targeted 
for genomic modification should be modified in the 
treatment of a disorder. Thus, the method and route of 

25 administration will optimally be chosen to modify at 

least 0.1 - 1% of the target cells per administration. 
In this way, the number of administrations can be held 
to a minimum in order to increase the efficiency and 
convenience of the treatment . 

30 Depending on the specific conditions being treated, 

such agents may be formulated and administered 
systemically or locally. Techniques for formulation and 
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administration may be found in "Remington's 
Pharmaceutical Sciences," 1990, 18th ed., Mack 
Publishing Co., Easton, PA. Suitable routes may include 
oral, rectal, transdermal, vaginal, transmucosal , or 
5 intestinal administration; parenteral delivery, 

including intramuscular, subcutaneous, intramedullary 
injections, as well as intrathecal, direct 
intraventricular, intravenous , intraperitoneal , 
intranasal, or intraocular injections, just to name a 
10 few. 

The subject being treated will additionally be 
administered a recombinase that specifically recognizes 
the attT and attD recombination sequences that are 
selected for use. The particular recombinase can be 

15 administered by including a nucleic acid encoding it as 
part of a nucleic acid construct, or as a protein to be 
taken up by the cells whose genome is to be modified. 
Methods and routes of administration will be similar to 
those described above for administration of a targeting 

2 0 construct comprising a recombination sequence and 

nucleic acid sequence of interest . The recombinase 
protein is likely to only be required for a limited 
period of time for integration of the nucleic acid 
sequence of interest. Therefore, if introduced as a 

2 5 recombinase gene, the vector carrying the recombinase 

gene will lack sequences mediating prolonged retention. 
For example, conventional plasmid DNA decays rapidly in 
most mammalian cells. The recombinase gene may also be 
equipped with gene expression sequences that limit its 

3 0 expression. For example, an inducible promoter can be 

used, so that recombinase expression can be temporally 
limited by limited exposure to the inducing agent. One 



64 



WO 00/11155 



PCT/US99/18987 



such exemplary group of promoters are tetracycline - 
responsive promoters the expression of which can be 
regulated using tetracycline or doxycycline. 

The invention will now be described in greater 
5 detail by reference to the following non-limiting 
Examples . 
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Exampl s 

Example 1- Identification of Pseudo-recombination 
Sequences 

5 The following example describes the identification 

of pseudo- loxP sequences by computer search. Similar 
procedures can be used to identify other pseudo- 
recombination sequences. 

The findpatterns algorithm of the Wisconsin 

10 Software Package Version 9.0 developed by the Genetics 
Computer Group (GCG; Madison, WI) , was used to screen 
all sequences in the GenBank database (Benson et al . , 
1998, Nucleic Acids Res. 26, 1-7) . Default parameters 
are given below. Patterns resembling the wild- type loxP 

15 sequence, called pseudo-loxP sites (ylox) herein, were 
sought. The results from two different search 
strategies (Patterns #1 and #2, see below) were pooled. 

The wild-type loxP site is 34 base pairs long and 
consists of two identical thirteen-basepair palindromes, 

2 0 separated by an eight -basepair core. It has been 

demonstrated that, while strand cutting and exchange 
take place in the eight -basepair core, the DNA sequence 
of most of this core is not critical, as long as it 
matches between the two sites that are to recombine 
25 (Hoess et al . , 1986, Nucleic Acids Res. 14, 2287-2300; 
Sauer, 1996, Nucleic Acids Res. 24, 4608-4613) . 
Therefore, most of these bases were set as n's in the 
search algorithm. Nucleic acid constructs created using 
the principles embodied in the invention allow for full 

3 0 control over the sequence of the incoming lox site, as 

its eight -basepair core can be made to match that of the 
genomic site being targeted. This feature of the 
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recombination reaction gives the desired level of 
specificity, allowing targeting of only one ylox site in 
the genome . 

Previous studies have suggested that the central 
5 bases of the thirteen-basepair palindrome, those closest 
to the eight -basepair core, are important for Cre 
recognition. Therefore, greater weight was given to 
matching the inner four or five positions of the 
palindrome . 

10 Using search Pattern #1, a search was constructed 

in such a way that the sequences returned by the search 
program would only look for resemblance in the thirteen- 
basepair palindromic regions of the loxP site. The 
sequence entered into the search algorithm is shown 

15 below: 

Pattern #1: ATAACTTCGTATA (n) {8} TATACGAAGTTAT . 

The (n) {8} allows the program to substitute any eight 
20 nucleotides in the region between the two thirteen- 
basepair inverted repeats and only look for similarity 
to the thirteen- basepair inverted repeats. Both 
strands were searched and no gaps or extensions were 
allowed. 

25 When the search was conducted allowing for a 

maximum of eight mismatches, a large number of hits were 
obtained in the primate database . The total number of 
sequences searched was 73,825, representing 118,684,866 
basepairs of sequence. The hits obtained from this 

3 0 search were then reviewed to identify likely pseudo- loxP 
candidates . Sequences having exact matches of at least 
four or five nucleotides immediately adjacent to the 
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core on each side were given preference because 
mismatches more than five nucleotides away from the core 
on either side may be tolerated to some extent by Cre 
recombinase. A similar search was undertaken with the 
5 rodent database. 

Search Pattern #2 made use of additional search 
criteria derived from structural studies of Cre. The 
crystal structure at 2.4 angstrom resolution of Cre 
recombinase complexed with loxP DNA reveals that contact 

10 is made between Cre and its target site at certain bases 
(Guo et al., 1997, Nature 389, 40-46). Footprinting 
with Fe-EDTA using Cre bound to the loxP site also 
reveals points of contact between Cre and bases in the 
loxP site (Hoess et al . , 1990, J. Mol . Biol. 216, 873- 

15 882) . These bases can be weighted more heavily to favor 
matching with the wild-type site. The search formula 
for determining a fit to these structural criteria was 
as follows for the 34-basepair lox site: 

20 Pattern #2: ATnACnnCnTATA nnnTAnnn TATAnGnnGTnAT . 

Again, both strands were searched and no gaps or 
extensions were allowed. A search demanding four or 
fewer mismatches with the specified 16 basepairs yielded 
25 an extensive list of matches with the extant DNA 
sequences . 

Searches were done in GenBank in the Primate, 
Rodent, Invertebrate, Plant, Fungus, and Bacteria 
databases. Some of the sites identified using these 
30 methods are shown in Figures 8A and 8B. The core 
sequences are shown in boldface type. 
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Example 2 - In vitro Excision Assay of Pseudo- lox Sites 
in Bacteria and Human Cells 

The following example demonstrates that the pseudo- 
recombination sequences of the invention are functional 
5 as sites for recombination of a nucleic acid sequence by 
a site-specific recombinase . 

A negative control plasmid, pLCGl (Figure 1A) , was 
created by inserting a 4.3-kb Xbal-BspHI fragment 
containing the lacZ gene, encoding p-galactosidase , 

10 driven by the CMV promoter (from pCMVSPORT-pgal, 

Gibco/BRL) into the EcoRV site of pLitmus2 9 (New England 
Biolabs, Beverly, MA) in the opposite orientation to the 
LacZa gene already present in the plasmid. This plasmid 
was then used as a base for the construction of other 

15 plasmids used in the excision assay. A very similar 
negative control plasmid, pL2|}50, was used in some of 
the experiments in place of pLCGl. Briefly, annealed 
oligonucleotides containing the lox sites being tested 
and a marker restriction enzyme site were directionally 

20 cloned into the EamHI-Hindlll sites on one side and the 
Bglll-Xhol sites on the other side of the CMV- lacZ 
construct. This cloning was carried out to ensure that 
Cre-induced site-specific recombination would result in 
excision of the lacZ marker gene . A schematic 

25 representation of the plasmids is shown in Figures 1A 
through 1C. Figure ID shows the DNA sequences of the 
lox sites from pWTLox 2 shown in Figure IB (top line of 
Figure ID) and plasmid p \|uoxh7q21 shown in Figure 1C 
(bottom lines of Figure ID) . 

30 The positive control plasmid used in the excision 

assay (pWTLox 2 ' Figure IB) had the 34 -bp wild- type loxP 
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site cloned into both the EamHI-Hindlll site and the 
Bglll-Xhol site. The test plasmids had a pseudo- 
recombination site cloned into the Bglll-Xhol site and a 
recombination site containing the 13 -bp palindromic 
5 repeats of loxP flanking the core sequence of the 

pseudo-recombination sequence cloned into the BamHI- 
Hindlll site. 

The bacterial strain used for the excision assay, 
2 94-Cre (Buchholz, et al, Nucleic Acids Research 

10 24:3318-3319, 1996) has been designed to constitutively 
express Cre recombinase at 3 7°C. 

Approximately 1 ng of the DNA being tested was 
electrotransf ormed into the 294-Cre strain of E. coli 
using the Bio-Rad Gene Pulser (BioRad Laboratories, CA) 

15 at a field strength of 12.5 kV/cm, with a capacitance of 
25 m f anc * resistance of 200 Q. Aliquots of the 
transformation mix were spread on plates containing 
ampicillin (100 /xg/ml) , methicillin (100 ^g/ml) , and X- 
gal (60 /zg/ml) . The plates were incubated at 37© c for 

2 0 18 hours, after which they were scored for the presence 
of blue and white colonies. Bacteria containing the 
parent plasmid pLCGl generated a blue bacterial colony 
when grown on these plates, whereas bacteria containing 
a plasmid from which lacZ sequence has been excised 

2 5 generated a white colony. The excision frequency was 

defined as the ratio of the number of white colonies to 
the total number of colonies, expressed as a percentage. 

As shown in Table 1 below, the excision frequency 
was close to 100% when the wild-type loxP sequences were 

3 0 present on the plasmid (positive control) and no 

excision was observed when no loxP sites were present. 



70 



WO 00/11155 



PCT/US99/18987 



Table 1 



lox Site 
Tested 


Mean 
Recombination 
Efficiency 
(%) 


none 


0 .00 


loxP 


98.9 


\\flox h7q21 


11.5 


\\flox h7q31 


8.9 


\\flox hXp22 
ylox h5pl5 
\\flox m9 
\\jlox m5 


99.0 
1.4 
4.0 

98 .7 



15 The results above are based on from 4 to 13 

separate experiments for each plasmid tested. The data 
indicate that pseudo- recombination sequences are 
functional/ and some pseudo-recombination sequences 
(\\flox hXp22 and \\?lox m5) promote recombination at very 

2 0 high frequencies, comparable to the wild- type loxP 

sequence. 

In conjunction with the data of Example 1, these 
recombination efficiency results help identify which 
basepairs within loxP are most critical for Cre binding. 
25 A strict correlation between the number of mismatches 
and the recombination efficiency was not observed. 
Therefore, it is clear that matches at specific 
positions are more important than overall homology. 
These results are consistent with the idea that the four 

3 0 bases flanking the core are important, as the ylox h5pl5 

site, that has a mismatch in this region while otherwise 
having good matches, had the lowest recombination 
frequency. The wild- type core sequence was not 
required. For example, \\tlox m5 t which had a 



WO 00/11155 



PCI7US99/18987 



recombination frequency indistinguishable from that of 
loxP, had no matches to loxP in the 8-bp core. However, 
the best sites had only A and T basepairs in the central 
two positions of the core, indicating that this feature 
5 may be important . 

The four ylox sequences identified by using Pattern 
#2, \\flox hXp22, ylox h5pl5, ylox m5, and ylox m9, 
included the two \\ilox sites with the highest excision 
efficiencies, ylox hXp22 and ylox m5, indistinguishable 

10 from loxP. On the other hand, ylox h5pl5, also obtained 
using Pattern #2, had the lowest recombination 
efficiency of the sites tested, probably because it 
contained a mismatch in the four positions nearest the 
core. These results suggest that while these first four 

15 positions are critical, the requirement for matching at 
the first five positions, used in screening the sites 
obtained with search Pattern #1, was overly restrictive. 
Good results would be obtained by using Pattern #2 in 
combination with a stringent requirement for matching at 

20 the first four positions from the core. 

A similar assay was carried out in mammalian cells . 
Briefly, a plasmid expressing Cre, pBS185 (Life 
Technologies Inc., Grand Island, NY) was modified by the 
insertion of a kanamycin resistance gene into the unique 

25 Seal site to create pBS185-Kan. This modification 
renders cells transfected with plasmid resistant to 
kanamycin but sensitive to ampicillin. Approximately 2 
/zg of plasmid pBS185-Kan and 50 ng of one of the 
plasmids used in the bacterial assay described above 

30 were transfected into 293 (ATCC Accession No. 1573) , 

human embryonic kidney cells, using Lipof ectAmine (Life 
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Technologies) following the manufacturer's 
recommendations* The transfected cells were treated 
with DNasel 24 hours after transf ection . The cells were 
grown at 37° C in Dulbecco's Modified Eagle medium 
5 (DMEM) for 72 hours after which low molecular weight DNA 
was isolated from the cells by Hirt extraction (Hirt, J. 
Mo. Biol. 26:365-369, 1967). The plasmid DNA was 
electrotransf ormed into E. coli strain DH10B (Life 
Technologies) under the conditions described above. 

10 Aliquots of the transformed bacteria were grown on 

amp/meth/X-gal plates as described above and scored for 
the presence of blue and white colonies. 

Exemplary results are shown in Figure 2 . The 
frequency of excision seen in a mammalian cell 

15 background demonstrates the predictive nature of the 

bacterial assay system and demonstrates that the pseudo- 
recombination sequences of the invention are active 
substrates for recombinase-mediated recombination in a 
mammalian cell environment. 

2 0 The \\flox h7q21 and \\flox hXp22 sites may mediate 

integration into the human genome. The ylox h7q21 site 
is located in the q21 region of chromosome 7, while the 
\\flox hXp22 site is situated in band p22 of the X 
chromosome. The existence of these sequences in the 

25 human genome was verified by sequencing the appropriate 
PCR fragments covering the sites from human genomic DNA. 
Neither site is located in a coding sequence or a known 
gene . 
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Example 3 - In vitro Transient Integration Assay of 
Pseudo-Io* Sites in Human Cells 

The following example provides a model system for 
assessing the ability of the pseudo- recombination 
5 sequences of the invention to promote genomic 
modification by site-specific insertion. 

The \\flox site to be tested was placed on a plasmid 
having tetracycline resistance (Figure 3, upper left) . 
This plasmid represented the chromosome and was the 

10 recipient for integration events. A lox site having the 
wild- type loxP palindromes and the 8 -bp core of ylox 
h7q21 was placed next to the lacZ gene on a second 
plasmid, this one having ampicillin resistance (Figure 
3, upper right). This plasmid represented the incoming 

15 donor vector. These plasmids were constructed as 

follows : The plasmid pTMl was generated by cloning a 
155 base-pair Afllll-SnaBI fragment from pLitmus29 
containing the multiple cloning site into a unique EcoRV 
site of pUC-Tet, a tetracycline resistant derivative of 

20 pUC19 (C.R. Sclimenti andM.P.C, unpublished). The lox 
sites of interest were then cloned into the J3glII-XhoI 
site of this plasmid to generate the recipient plasmids 
for the integration assay (pRWT and pRh7q21) . 

The plasmid pLGWTLox 2 was used as a base for the 

25 construction of the donor plasmids used in the 

integration assay. pLGWTLox 2 was created by treating 
pWTLox 2 with EcoRl and subsequent religation to excise 
the CMV promoter and create a unique EcoRI site between 
one of the loxP sites and the lacZ gene. Complementary 

30 oligonucleotides containing the loxP-derived palindromes 
with the core derived from the \\tlox h7q21, a marker 
enzyme site, and EcoRI half -sites at the ends were 
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annealed and ligated into the unique EcoRI site of 
pLGWTLox 2 to generate the pDh7q21 donor plasmid for the 
transient integration assay. 

To perform the assay, 50 ng of the tetracycline- 
5 resistant recipient plasmid and 1 /xg of the ampicillin- 
resistant donor plasmid were co- transf ected into human 
293 cells with Lipof ectamine along with 2 fig of the Cre 
expression vector pBS185-Kan. The transfected cells 
were treated with DNasel 24 hours after transf ect ion. 

10 After 72 hours in human cells, plasmid DNA was purified 
by Hirt extraction (Hirt, J. Mo. Biol. 26:365-369, 1967) 
and returned to the DH10B strain of E. coli for 
detection of integration events. Plasmids that 
underwent integration were tetracycline resistant and 

15 now also carried lacZ (Figure 3, lower left). They thus 
gave rise to blue colonies when plated on LB medium 
containing tetracycline and X-gal and incubated 
overnight at 37° C. Plasmid DNA was purified from blue 
colonies, and those plasmids with the restriction 

20 pattern expected for integration were classified as 
integrants. Each blue colony was restrealced on LB 
plates containing X-gal and either ampicillin and 
methicillin, or tetracycline. One representative 
plasmid was sequenced in the relevant regions to 

25 document integration at lox sites. The integration 

frequency was calculated as the number of integrants 
divided by the total number of tetracycline-resistant 
colonies . 

The integration assay was performed with recipients 
30 bearing the \\tlox h7q21 site or controls having either 
the wild-type loxP site or no lox site, along with the 
corresponding donors. The integration frequency at the 
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wild-type loxP site was 0.41%. Integration at the \\flox 
h7q21 site was readily detectable and occurred at a 
frequency of 0.12%. Experiments performed with either 
the recipient alone or the donor alone in the presence 
5 or absence of the Cre expression plasmid did not yield 
any integrants. Transfection of the recipient and the 
donor in the absence of the Cre expression plasmid also 
failed to yield any integrants. These results 
demonstrate that detectable site-specific integration 
10 occurs at a pseudo- lox site in the human cell 
environment . 

A second type of shuttle vector system that can be 
used to model chromosomal integration utilizes modified 
autonomously replicating vectors such as those described 

15 in issued U.S. patent No. 5,707,830. These types of 

vectors replicate stably in human cells and have a very 
low endogenous mutation frequency (DuBridge, et al, Mol. 
Cell. Biol. 7:379-387, 1987). Thus, they provide better 
models for the chromosome than newly transfected plasmid 

20 DNA. One preferred shuttle vector may have EBNA-1 

sequences, the EBV family of repeats, oriP or a human 
chromosomal ori, a bacterial origin of replication, and 
a pseudo- lox sequence and a marker gene such as one 
conferring hygromycin resistance. This vector is 

25 established in mammalian cells using antibiotic 

selection. The cells are transfected with a plasmid 
expressing Cre and a plasmid having a lox recombination 
sequence and a second marker gene, such as a gene for 
chloramphenicol resistance. The assay is performed as 

3 0 described above. 
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Example 4 - In vitro Chromosomal Assay for Integratitpn 
Efficiency 

The following example evaluates the efficiency at 
which a heterologous nucleic acid sequence can be 
5 inserted into a chromosome at a particular pseudo- 
recombination site (integration efficiency) and the 
level of expression of a gene sequence inserted therein . 

Bicistronic assay vectors are constructed 
containing, for example, a gene coding for hygromycin 

10 resistance under the control of the thymidine kinase 

promoter and a gene encoding the enzyme chloramphenicol 
acetyl transferase (CAT) under the control of the 
cytomegalovirus immediate early promoter (Wohlgemuth, et 
al, Gene Therapy 3:503-512, 1996). The former marker is 

15 used primarily to assess integration frequency while the 
latter marker is useful for sensitively assaying the 
level and duration of gene expression. The vector 
additionally carries a lox sequence containing the core 
of the pseudo- loxP sequence under evaluation. 

20 The test plasmid is transfected into mammalian 

cells, such as 293S cells (human) or NIH3T3 cells 
(mouse) , along with a Cre- expressing plasmid, such as 
one of those described above. The transfected cells are 
grown in the presence of hygromycin and the number of 

25 hygromycin resistant colonies scored as a measure of 

integration frequency. A number of antibiotic resistant 
colonies are propagated and analyzed by polymerase chain 
reaction (PCR) and Southern blotting to determine 
whether they have an integration event targeted to the 

30 correct ylox site. CAT gene expression is measured as 
follows. Cell extracts are prepared by standard 
procedures and total protein of the extract is 
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normalized for total protein concentration and assayed 
for CAT activity as described by Gorman, et al, Proc 
Natl Acad Sci USA 79:6777, 1982 or Wohlgemuth, supra. 

5 Example 5 - In vivo Assay for Integration 

The following assay evaluates the ability of a 
recombination sequence to promote integration of a 
heterologous nucleic acid sequence into a genome in 
vivo. 

10 The in vivo integration and expression of the CAT 

gene by employing the teaching of the invention is 
evaluated essentially as described by Zhu, et al, 
Science 261:209-211, 1993. Vectors, one containing a 
lox recombination sequence and CAT gene and one 
15 expressing Cre, are mixed with liposomes that have a net 
cationic charge, for example, containing N[l-(2,3- 
dioleyloxyl) propyl] -N,N, N-trimethylammonium chloride 
(DOTMA) (Feigner, et al, Proc Natl Acad Sci USA 84:7413, 
1987) and dioleoyl phosphatidylethanolamine (DOPE) in a 
20 1:1 ratio. The ratio of DNA to liposomes is typically 
1:1. The liposome /DNA mixture is typically injected 
into test mice in 200 fil of 5% dextrose in water 
intravenously through the tail vein. 

At various time points, starting at 24 hours post- 
25 injection, test mice are sacrificed and various tissues 
harvested and homogenized. Cleared homogenates are 
assayed for CAT enzyme activity using a scintillation 
counting assay (Seed and Sheen, Gene 67:271-277, 1988) 
with the following modifications: 0.3 /xCi of 14 C-labeled 
3 0 chloramphenicol (55 mCi/mmol) is added to 2 00 nmol of 
acetyl coenzyme A for a final volume of 122 /il . CAT 
activity is expressed as either CAT enzyme/weight of 
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tissue or as a function of milligrams of protein in each 
tissue extract. Tissue extracts are prepared by- 
standard procedures and total protein determined using 
standard protocols (Bradford, Lowrie, and the like) . 

5 

Example 6 - Intramolecular Integration Assay for a Site- 
Specific Recombinase in E. coli 

The following example describes a rapid assay to 
measure site-specific integration by a recombinase. 

10 This assay was used to measure integration of the wild- 
type (|>C31 attB sequence into the wild- type <|>C31 attP 
sequence in the presence of the <(>C31 integrase. A 
similar assay can be used measure integration mediated 
by other recombinases of interest, such as the 

15 integrases of phages R4 and TP- 901. 

Integrase-expressing plasmids were constructed as 
follows. The <(>C31 integrase gene was amplified by the 
polymerase chain reaction from the plasmid pIJ8600 
containing the 4>C3 1 integrase and attP (M. Bibb, John 

20 Innes Institute, Norwich, U.K.) with the following 
primers : 5 ' GAACTAGTCGTAGGGTCGCCGACATGACAC3 ' and 
5 ' GTGGATCCGGGTGTCTCGCTACGCCGCTAC3 ' . The PCR product was 
ligated into linear pCR2.1 (Invitrogen, Carlsbad, CA) at 
the T overhang to make the plasmid pTA-Int. The lacZ 

25 gene was removed from pCMVSPORTpGal (Life Technologies, 
Grand Island, NY) by digestion with the restriction 
enzymes BanMl and Spel, and replaced by the integrase 
gene from pTA-Int with BairiHl and Spel compatible ends, 
creating the plasmid, pCMVInt (Figure 4B) , which 

30 expresses <|>C31 integrase in mammalian cells under 

control of the cytomegalovirus immediate early promoter. 
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The integrase gene was subsequently removed from 
pCMVSPORTInt by digestion with BamHI and PstI and 
ligated into pACYC 177 (resistances ampicillin and 
kanamycin) (S. Cohen, Stanford University, Stanford, CA) 
5 that had also been treated with BamHI and PstI, removing 
part of the ampicillin resistance gene. Finally, the 
lacZ promoter was removed from pBCSK+ (Stratagene, La 
Jolla, CA) by digestion with Sad and Sapl . The 
integrase-containing pACYC plasmid was digested with 

10 PstI and SacI , and the lacZ promoter was inserted 
upstream of the integrase gene with a linker 
( 5 ' GCTCGGCCAAAAAGGCCTGCA3 ' , 5 ' GGCCTTTTTGGCCG3 ' ) , 
creating the plasmid, pint (Figure 4A) , expressing the 
<|>C31 integrase under control of the lacZ promoter. 

15 The intramolecular integration assay plasmid was 

constructed as follows. The bacterial attachment site 
for <|>C31 (attB) was amplified by PCR from Streptomyces 
lividans genomic DNA (S. Cohen, Stanford University, 
Stanford, CA) with the primers: 

20 5 ' CAGGTACCGTCGACGATGTAGGTCACGGTC3 ' and 

5 ' GTCGACATGCCCGCCGTGACCG3 ' . This attB fragment was 
ligated into linear pCR2.1 at the T overhang sites to 
create the plasmid pTA-attB containing a 2 85 bp attB 
region. The phage attachment site (attP) was amplified 

25 by PCR from pIJ8600 with the primers 
5 ' CGACTAGTACTGACGGACACACCGAA3 ' , 

5 ' GTACTAGTCGCGCTCGCGCGACTGACG3 ' and ligated into linear 
pCR2 . 1 at the T overhang sites to create the plasmid 
pTA-attP, containing a 221 bp attP region. The lacZa 
30 was removed from pBCSK+ by digestion with Pvul and Kpnl , 
treatment with T4 polymerase, and religation. The full 
length lacZ gene from pCMVSPORTBGal was removed by 
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digestion with Spel and Hindi II and cloned into the Spel 
and Hindlll sites of the lacZa deficient pBCSK+ to make 
pBCPGal . The attP was then removed from pTA-attP by 
Spel digestion and cloned into the Spel site of pBC(}Gal . 
5 The attfi was then removed from pTA-attB by Sail 

digestion and cloned into the Sail site of the attP 
containing pBCPGal, to create the assay plasmid pBCPB+ 
(Figure 4C) , in which the TTG cores of the att sites are 
in the same orientation. In addition, a control 

10 plasmid, pBCPB- , in which the att sites were in opposite 
orientations, was also constructed. 

The pint plasmid was then transformed into DH10B 
bacteria, grown under kanamycin selection, and made 
electrocompetent by a standard protocol . The resulting 

15 electrocompetent DHInt cells were used in the bacterial 
intramolecular integration assay, conducted as follows. 
200 ng of the assay plasmid of choice was electroporated 
into DHInt cells, allowed to recover for one hour, 
spread on plates containing chloramphenicol and Xgal, 

20 and grown at 37<> C. If an intramolecular integration 
event occurs, the lacZ gene located between the attB 
and attP sites will be excised, and a resulting colony 
will be white. The frequency of intramolecular 
integration was therefore calculated as the number of 

25 white colonies divided by the total number of colonies. 

When this assay was carried out in DHInt bacteria 
using pBCPB+, all colonies were white, indicating 
efficient integration. Thousands of colonies were 
assayed for each plasmid tested. The same plasmid 

30 produced only blue colonies in DH10B bacteria, in the 
absence of the integrase gene. These results verify 
that the assay plasmid carried functional attB and attP 
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sites and that the <(>C31 integrase functioned efficiently 
in E. coli with no added co-factors. In contrast, the 
plasmid pBCPB-, which carried the att sites in inverted 
orientation, resulted in blue colonies, because the lacZ 
5 gene was merely inverted, not excised , by the 

integration reaction. The assay plasmid with no att 
sites, pBCSK-Pgal, also yielded only blue colonies in 
DHInt cells. Restriction enzyme digestion of plasmid 
DNA purified from a representative number of white 
10 colonies verified that the intramolecular integration 
reaction occurred as expected and resulted in deletion 
of lacZ between the attB and attP sites. 

Example 7 - Intramolecular Integration Assay in 
15 Mammalian Cells 

The following example demonstrates the ability of 
phage <|>C31 integrase to integrate sequences site- 
specifically and efficiently in a mammalian cell 
environment . 

20 To perform the intramolecular integration assay in 

human cells, the same pBCBP+ plasmid was used as in the 
bacterial assay of Example 6 . The pCMVInt plasmid was 
substituted for pint to ensure expression of <|>C31 
integrase in mammalian cells. Subconfluent (60-80%) 60 

25 mm plates of human 293 cells grown in DMEM supplemented 
with 9% fetal bovine serum and 1% 
penicillin/streptomycin were transfected with 
lipof ectamine (Life Technologies) at a ratio of 6 /xg 
lipof ectamine per of DNA. Experiments were performed 

30 with 100 ng of the assay plasmid of interest and 2 fig of 
pCMVInt. Controls performed in each experiment included 
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no DNA, pCMVInt only, pBCSK-Pgal (assay plasmid with no 
att sites), pBCSK-Pgal + pCMVInt, and pBCPB+ alone. 

Twenty- four hours after transf ection, the medium 
was supplemented with 50 Units/ml of DNasel to reduce 
5 the background of untransf ected DNA. Three days after 
transf ection, the cells were harvested and low molecular 
weight DNA was recovered by using the Hirt procedure 
(Hirt, J. Mo. Biol, 26:365-369, 1967). A portion of 
this DNA was electroporated into competent DH10B E. coli 

10 cells and spread on plates containing chloramphenicol 
and Xgal to select only for the assay plasmid. The 
intramolecular integration frequency was determined to 
be the number of white colonies divided by the total 
number of colonies . 

15 Using this assay system in mammalian cells, the 

(|>C31 integrase was shown to catalyze recombination 
between the full-length attB and attP sites of pBCBP+ at 
a frequency of 50.6% (mean of 16 experiments, standard 
error = 2.32%) . This frequency is likely to be an 

2 0 underestimate as plasmid DNA that never came in contact 

with the <|>C31 integrase was probably present, despite 
efforts to remove untransf ected DNA with DNasel. It is 
clear that the <|>C31 integrase catalyzes efficient site- 
specific integration in mammalian cells. 
25 To verify site-specific recombination, 96 white 

colonies were picked and plasmid DNA was prepared and 
examined by restriction digestion. Of these, 97% 
contained a plasmid that represented the expected site- 
specific recombinant. The remaining colonies contained 

3 0 plasmids that carried large rearrangements that 

disrupted lacZ. The low frequency rearrangement of 
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transfected plasmids was observed with all plasmids, 
with and without integrase and att sites, and can be 
attributed to transf ection-associated mutation of newly- 
introduced DNA. 

5 

Example 8 - Determination of the Minimal Sizes of 
Recombination Sequences 

The following example describes the process for 
determining the minimal sequences needed for recognition 

10 and recombination by a site-specific recombinase. This 
process was used to determine the minimal wild-type attB 
and attP sequences functionally recognized by the <(>C31 
integrase in bacterial and mammalian cell environments. 
A similar process can be used to identify the minimal 

15 sequences recognized by other recombinases of interest, 
such as the integrases of phages R4 and TP-901. The 
minimal attB and attP sequences can then be used to 
identify pseudo- recombination sequences, for example as 
described above for the Cre -lox system. 

20 Prior to this study, the minimal sizes for the <|>C31 

attachment sites, attB and attP, had not been 
determined. The attB site had been localized to 
approximately 280 basepairs and the attB region had been 
localized to 86 basepairs (Thorpe and Smith, Proc . Natl. 

25 Acad. Sci . USA, 1998). The intramolecular integration 
assay described in Example 6 was used to determine the 
minimal functional sizes for these att sites. Short 
double- stranded adaptor molecules containing att sites 
of various lengths were created by annealing single- 

30 stranded oligonucleotides. These shorter sites were 

used to replace the full-length att sites in the pBCPB+ 
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assay plasmid, and recombination efficiencies were 
determined by electroporation into E. coli. 

To determine the minimal function size of attB, the 
278-basepair full-length attJB surrounded by BazrMI and 
5 Hindlll sites was removed. This fragment was replaced 
by the series of synthetic shorter sites having ends 
permitting their orientation-appropriate cloning into 
pBCBP+. The resulting plasmids were electroporated into 
DHInt E. coli cells and recombinants were scored as 

10 white colonies, as described in Example 6 above. Figure 
5 (left side) shows the results of these experiments. 
AttB sites of 50, 40, 35, and 34 basepairs all provided 
full recombination function, i.e. they functioned at 
100% of the efficiency of the full-length attB. 

15 Reduction of the site to 3 3 basepairs produced a marked 
decrease in recombination activity. Therefore, 34 
basepairs was determined to be the minimal function size 
of attB. 

Once attB was determined to be 34 basepairs long, 
20 attP was subjected to a similar set of reductions. The 
reduced attP sites were assayed on a plasmid carrying 
attB34 rather than full-length attB. To perform these 
experiments, the full-length attP surrounded by SacII 
and Spel sites was replaced with a series of synthetic 
25 annealed oligonucleotides bearing ends permitting their 
correct orientation-specific cloning into pBCPB+-attB34 . 
Figure 5 (right side) depicts the results of these 
experiments. The function of attP dropped off as its 
size was reduced from 40 to 36 basepairs. The DNA 
3 0 sequence revealed that the 3 8 basepair site encompassed 
the major inverted repeat evident in attP. However, it 
was apparent from this data that the next two outermost 
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basepairs conveyed some function (P39A&B) . From this 
analysis, the minimal size of attP was determined to be 
3 9 basepairs. 

To determine the frequency at which the reduced att 
5 sites function in mammalian cells, the same panel of 
plasmids was analyzed by using the intramolecular 
integration assay described in Example 7. Each of the 
assay plasmids was transfected into human 2 93 cells 
along with pCMVInt . After 72 hours in the mammalian 

10 cells, the plasmid DNA was purified by the method of 
Hirt (Hirt, J. Mo. Biol. 26:365-369, 1967) and 
transformed into DH10B E. coli cells for scoring of 
recombinants. The results of these experiments showed 
that minimal sizes for attB and attP similar to those 

15 determined in E. coli also applied in mammalian cells. 
Approximately 60-90% of the efficiency of the full- 
length att sites was achieved with the same reduced att 
sequences that worked at 100% efficiency in E. coli, 
likely because the overall reaction is somewhat less 

20 efficient in the mammalian cell environment. 

These experiments to determine the minimal sizes of 
attB and attP provided the information that these 
recombination sites had sizes of 34 and 39 basepairs, 
respectively. These sizes are similar to that of the 

25 34-basepair loxP site. A recombination site of this 

size will possess active pseudo recombination sites in 
large genomes, such as those of mammals and most plants. 
Thus, it is statistically expected that the pseudo 
recombination sites for the <(>C31 integrase will occur in 

30 these genomes. These pseudo recombination sites 
represent targets for chromosome engineering. 



86 



WO 00/11155 



PCT/US99/18987 



Example 9 - Determination of the Amount of Heterogeneity 
Tolerated in the Core Sequence of a Recombinase Site. 

The amount of heterogeneity tolerated in the 3 -bp 
core sequence of the attB and attP sequences recognized 
5 by the <|>C31 integrase was determined. Similar methods 
can be used to determine the amount of core 
heterogeneity tolerated in the cores of other 
recombinases of interest, such as the integrases of 
phages R4 and TP-901. 

10 The <|>C31 integrase catalyzes recombination between 

attB and attP sites. These sites have minimal 
functional lengths of 34 and 3 9 basepairs, respectively. 
While largely distinct in sequence, attB and attP share 
a three basepair common core sequence, TTG, that 

15 includes the crossover region. In the case of the 8- 
basepair core region of the loxP site targeted by Cre 
recombinase/ it has been found that its sequence is 
largely unimportant, as long as it matches between the 
two recombining sites. To determine if this behavior 

20 applied to the core region of the attB and attP sites of 
the <|>C31 integrase, the effects of mutations within this 
core region were examined. 

A panel of plasmids was generated in which either 
attB, attP, or both sites were altered with a specific 

25 single base change. These changes were then assayed 
with the intramolecular integration assay in E. coli 
described in Example 6. A recombination event results 
in excision of the lacZ gene located between the att 
sites. Thus, when an assay plasmid is transformed into 

30 bacteria expressing <(>C31 integrase, a site-specific 
recombination event is scored as a white colony. 
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The TTG core was mutated in each position 
individually to all other base possibilities. The 
effects of these mutations in attB were investigated 
when paired with a wild-type attP. Conversely, the 
5 effects of a mutant attP paired with a wild-type attB 
were measured. By combining attB and attP sites that 
contained identical mutations, it was determined whether 
the core region needed to only match to be effective in 
recombinat ion . 

10 To carry out these experiments, oligonucleotides 

bearing the mutations to be tested were synthesized in 
the context of attB34 or attP40 (see Example 8) . The 
mutant oligonucleotides were annealed and cloned into 
the chloramphenicol-resistant intramolecular integration 

15 assay vector pBCBP+ to replace the wild-type attB or 

attP, as in Example 8. Individual plasmids containing 
the mutation of interest were assayed for recombination 
in E. coli strain DHInt, which carries the kanamycin- 
resistant integrase expression plasmid pint, described 

20 in Example 6. Assay plasmid DNA (2 ng) was 

electroporated into DHInt, and after a 1 hour recovery 
period at 37° C in rich media, the transformations were 
plated on LB agar containing 2 5 mg/ml chloramphenicol, 
60 mg/ml kanamycin, and 50 mg/ml X-gal . The plates were 

25 incubated overnight (16-18 hours) at 37<> C, after which 
blue and white colonies were counted. The recombination 
fraction was expressed as the percentage of white 
colonies out of total colonies. The results of these 
experiments are shown in Figure 6. 

3 0 The first and third positions of the core showed 

some flexibility, while the center position did not. 
The first position appeared to tolerate only 
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pyrimidines ; the CTG double mutant worked well. The 
third position of attP could be changed to any base, and 
to the other purine for attB. Overall, the pattern of 
base substitutions tolerated in the recognition sites 
5 for the <|>C31 integrase more closely resembled the degree 
of tolerance for substitutions typical of the outer 
palindromes, rather than the core, of the loxP site. 
Thus, unlike the situation in the Cre-loxP system, the 
<(>C31 integrase has strong base preferences within the 
10 cores of its attB and attP recombination sites, and 

merely matching any two three -basepair core sequences 
will not suffice to generate efficient recombination in 
this system. 

15 Example 10 - Bimolecular Integration Assay into a Model 
Chromosome in Mammalian Cells 

The following example demonstrates the ability of 
phage <|>C31 integrase to integrate sequences site- 
specifically and efficiently into a model chromosome in 

20 a mammalian cell environment. 

Example 7 demonstrated that the <|>C31 integrase 
efficiently catalyzed site-specific intramolecular 
integration in mammalian cells. The next step was to 
show that the integrase could catalyze efficient site- 

25 specific integration of exogenous DNA into mammalian 

chromosomes in cell culture. EBV-based plasmids provide 
easy and useful models for chromosomes. EBV vectors 
exist in the nucleus, replicate in synchrony with the 
chromosomes, and bear chromatin indistinguishable from 

3 0 that of the chromosomes. They can be easily purified 
from cells and transformed into E. coli for rapid 
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scoring of integration events. Thus they have great 
utility in characterization of the integration reaction 
in human cells. 

In these experiments, a kanamycin- resistant EBV 
5 plasmid was equipped with an attB site and established 
in human 293 cells to create a stable attB- containing 
human cell line. An ampicillin-resistant plasmid 
carrying attP and lacZ was then co-transf ected into the 
attB cell line, along with a plasmid expressing the <J)C31 

10 integrase. To assay for integration products, after 
three days plasmid DNA was extracted and transformed 
into bacteria. Blue colonies that grew on plates 
containing kanamycin, ampicillin, and Xgal were scored 
integrants, while total colony number could be obtained 

15 by plating on kanamycin alone. 

The attB and attP plasmids needed for this study 
were constructed as follows. The target EBV based 
plasmids were based on p220.2 (DuBridge et al, 1987) . 
The control plasmid p22 0K was made by inserting the 

2 0 kanamycin resistance gene from the Kan-resistant 

Genblock (Amersham Pharmacia, Piscataway, NJ) into the 
XmnI site of the ampicillin resistance gene of p220.2. 
To make attB-containing p220 plasmids, the ampicillin- 
resistance gene of p220.2 was removed by digestion with 
25 BspHI. The kanamycin resistance gene described above was 
isolated by digestion with PstI, and cloned into amp- 
p22 0.2 with BspHI -PstI linkers 

( 5 ' CATGAGGCCAAAAAGGCCTGCA3 ' and 5 ' GGCCTTTTTGGCCT3 ' ) to 
create the plasmid p220K. The full length attB was 

3 0 removed from the plasmid pTA-attB (Example 6) by Sail 

digestion and cloned into the Sail site of p220K, 
creating the plasmid p220KattBf ull (Figure 4D) . The 35 
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base pair attB was cloned into the Sail and Baitill sites 
of p220K by using the oligonucleotides, 5' 
gatccgatatcgcgcccggggagcccaagggcacgccctggcaccg 3 ' and 
5 ' tcgacggtgccagggcgtgcccttgggctccccgggcgcgatatcg3 ' , 
5 creating the plasmid p220KattB35. 

These EBV plasmids, p220K, p220KattBf ull , and 
p220KattB35, were established in human 293 cells as 
follows. 293 cells were grown in DMEM containing 9% 
fetal bovine serum and 1% penicillin/streptomycin to 

10 -70% confluency in a 100 mm plates. 8 /xg of 

p220KattBfull, p220Kattb35, or the control p220K were 
introduced by transfection with lipof ectamine according 
to the manufacturer's protocol. At 24 hours post- 
transf ection, the cells were split 1:4, and at 48 hours 

15 post-transf ection hygromycin selection (350 /ig/ml) was 

begun, 11 to 14 days after starting selection the cells 
were expanded and frozen down. 

The attP-containing plasmid pTSAD (Figure 4E) was 
constructed as follows. A multiple cloning site 

20 (oligos: 

5 ' AATTACCGCGGGGCGCGCCGTTTAAACGCATGCCAATTGGGCCGGCCG3 ' and 
5 ' AATTCGGCCGGCCCAATTGGCATGCGTTTAAACGGCGCGCCCCGCGGT3 ' ) 
was cloned into the EcoRI site of the plasmid pWTLox 2 
(Example 2) upstream of lacZ f regenerating one EcoRI 

2 5 site. The attP site was removed from the plasmid 

pTAattP (Example 6) by digestion with EcoRI and cloned 
into the regenerated EcoRI site of pWTLox 2 to create the 
plasmid pESl. The lacZ promoter was removed from pBCSK+ 
by digestion with PvuII and SacII and cloned into pESl 

30 which had been digested with Pjnel and Sad I. The region 
containing attP, the lacZ promoter, and the lacZ gene 
was removed by digestion with BairHI and Bglll and cloned 
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into the BamRI site of pTSA30 (Gregory Phillips, Iowa 
State University, Ames, Iowa) to create the donor 
plasmid pTSAD. pTSA3 0 and its pTSAD derivative are 
temperature sensitive for plasmid replication in E. 
5 coli. 

To perform the integration assay, EBV plasmid- 
containing cells were grown to confluency in DMEM 
containing 9% fetal bovine serum, 1% 

penicillin/streptomycin, and 200 /-tg/ml hygromycin in 10 

10 cm plates. These plates were split into eight 60 mm 

plates and grown in the above medium without hygromycin 
for 24-48 hours, until they were approximately 60-80% 
confluent. pCMVInt (Example 7, Figure 4B) and pTSAD 
were transfected in equimolar amounts (10 fig total DNA) 

15 using 50 fil Superfect (Qiagen, Valencia, CA) according 
to the manufacturer's protocol. As controls, no DNA, 4 
jig pCMVInt, or 6 ^g pTSAD were cotransf ected with salmon 
sperm DNA (to 10 fig) . In addition, an equimolar amount 
of a plasmid encoding the green fluorescent protein (a 

20 derivative of pEGFP-cl, Clonetech, Palo Alto, CA) with 
salmon sperm DNA to 10 /ig was transfected in parallel 
into the EBV plasmid-containing cells to monitor 
transfection efficiency. 

2.5-3 hours after transfection, the Superfect 

2 5 was removed from the cells and replaced with serum- 
containing medium. Cells were fed with medium 
containing serum and 50 U/ml 24 hours after transfection 
and harvested 72 hours after transfection. Low 
molecular weight DNA was purified by Hirt extraction 

30 (Hirt, J. Mo. Biol. 26:365-369, 1967) and transformed 
into DH10B E. coli by electroporation . Also, 24 hours 
after transfection, transfection efficiency was measured 



92 



WO 00/11155 



PCT/US99/18987 



by counting the green fluorescent protein-expressing 
cells relative to the total number of cells. The 
transfection efficiencies typically ranged from 6-18%. 
Because untransf ected cells would have no opportunity to 
5 undergo integration but would still contribute EBV 
plasmids to the bacterial assay in the form of white 
colonies, the transfection efficiency was needed to 
obtain the correct the integration frequency. 

In a typical experiment, 15 ill of a transformation 

10 was spread on each of three plates containing kanamycin, 
Xgal, and IPTG, while 150 /zl of the same transformation 
was spread on each of three plates containing 
ampicillin, kanamycin, Xgal, and IPTG. The bacteria 
were grown overnight at 42° C for approximately 16 h. 

15 The elevated temperature prevented replication of pTSAD, 
which has a temperature -sensitive plasmid origin of 
replication. Integrants were scored as the blue 
colonies on the plates containing both kanamycin and 
ampicillin. Integration frequency was calculated as the 

20 number of blue colonies on kanamycin and ampicillin 
plates divided by the total number of colonies on 
kanamycin plates X 10 for each set of transf ections . 
Raw numbers for integration frequency were divided by 
transfection efficiency to obtain accurate values for 

25 integration frequency. 

Figure 7 lists the integration frequencies obtained 
with each of the EBV plasmids and the negative controls. 
Each line of the figure represents a minimum of three 
separate transf ections . For p220K, which lacks the attB 

30 site, a negligible frequency of blue colonies was 
detected. Upon analysis, these plasmids were not 
integrants, but rather homologous recombination events 
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that occurred through common amp sequences on the two 
plasmids. For p220KattB35, carrying a minimally sized 
attB, a significant number of blue colonies were 
detected. When corrected for the transfection 
5 efficiency in these experiments, the integration 

frequency was 1.7%. For p22 0KattBf ull , the integration 
frequency was even higher, at 7.5%. This increase 
presumably reflects a favorable sequence context for the 
full attB site compared to the reduced site. Controls 

10 in which pCMVInt, pTSAD, and each of the EBV plasmids, 
p220K # p220KattBfull, and p22 0KattB35 were co- 
transformed directly into E. coli yielded negligible 
numbers of blue colonies (0.002% or less). These 
controls confirmed that the high frequency integration 

15 events scored above occurred in human cells, not in E. 
coli. 

The integration frequency into an attB site located 
on an EBV plasmid is impressively high and several 
orders of magnitude higher than the frequencies of 

20 random integration or homologous recombination, 
highlighting the utility of this invention. 
Furthermore, the integrants are site-specific, as 
indicated by restriction mapping of more than 160 of the 
blue colonies from the experiments with p220KattB35 and 

25 p22 0KattBfull . In addition, two integrants each, from 
the experiments with p220Katt35 and p220Kattfull , were 
analyzed at the DNA sequence level across the junctions 
of the integration site, confirming that exact site- 
specific integration occurred between attB and attP. 

30 Figure 7 indicates that, as expected, the reaction 
requires the presence of both the integrase gene 
(pCMVInt) and the attP target site (pTSAD) . Because EBV 
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vectors are nuclear, chromatinized mini -chromosomes/ the 
high integration frequency obtained in this system is 
predictive of the expected integration frequencies into 
att sites located on the chromosomes. 

5 

Example 11 - Assay for Integration into the Chromosomes 
of Mammalian Cells 

The following example describes methods used to 
demonstrate the ability of phage <|>C31 integrase to site- 
10 specifically integrate sequences into mammalian 
chromosomes . 

Cell lines carrying the wild-type <f>C31 attS site 
are prepared by transfecting human 2 93 cells with 
Lipof ectamine and a plasmid carrying the attB sequence 
15 and the hygromycin resistance gene. The cells are grown 
in DMEM containing hygromycin and resistant colonies 
propagated to mass culture. Integration of the attS 
sequence is verified by Southern blot analysis using 
plasmid sequences as probes. These cell lines are then 

2 0 transfected with Lipof ectamine and a plasmid containing 

the attP sequence and a neomycin/G418 resistance gene 
and a plasmid expressing the (|>C31 integrase gene under 
control of the CMV promoter. The G418 antibiotic is 
added to the DMEM growth medium approximately 4 8 hours 
25 after transf ection. Selection is maintained for 
approximately ten days, after which the number of 
colonies is scored. 

Higher numbers of neomycin resistant colonies are 
seen in cells co-transf ected with the <|>C31 integrase- 

3 0 expressing plasmid than in cells that do not receive the 

integrase. Likewise, higher numbers of neomycin- 
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resistant colonies are obtained in cells lines carrying 
attB compared to the parent 293 cell line lacking attB. 
These results suggest that the <|>C31 integrase enzyme can 
catalyze the integration of heterologous sequences into 
5 a mammalian genome, both at an integrated attB sequence 
and at endogenous pseudo-recombination sequences. 

Similar experiments can be conducted using cell 
lines carrying an integrated attP hygromycin- resistant 
plasmid, followed by transfection with a neomycin- 

10 resistant attB plasmid, to demonstrate integration into 
the integrated wild-type attP and attP pseudo-sites. 
Furthermore, similar experiments can be conducted in 
other cell types, such as those derived from other 
mammalian species or from plants, to test integration 

15 activity in these cellular backgrounds. 

While the foregoing has been with reference to 
particular embodiments of the invention, it will be 
appreciated by those skilled in the art that changes in 
these embodiments may be made without departing from the 

20 principles and spirit of the invention, the scope of 
which is defined by the appended claims. 
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What is claimed is: 



1. A method of site-specifically integrating a 
polynucleotide sequence of interest in a genome of a 
5 eucaryotic cell, said method comprising: 

introducing (i) a circular targeting construct, 
comprising a first recombination site and the 
polynucleotide sequence of interest, and (ii) a site- 
specific recombinase into the eucaryotic cell, wherein 
10 the genome of said cell comprises a second recombination 
site native to the genome and recombination between the 
first and second recombination sites is facilitated by 
the site-specific recombinase, 

maintaining the cell under conditions that allow 
15 recombination between said first and second 

recombination sites, wherein the recombination is 
mediated by the site-specific recombinase and the result 
of the recombination is site-specific integration of the 
polynucleotide sequence of interest in the genome of the 
20 eucaryotic cell. 



2. The method of claim 1, wherein the site- 
specific recombinase is selected from the group 
consisting of Cre recombinase, Cre-like recombinase, Flp 
25 recombinase, and R recombinase. 



3. The method of claim 2, wherein the recombinase 
normally facilitates recombination between two 
recombination sites, wherein said sites are essentially 
3 0 the same, and where the sites are designated 

recombinase-mediated-recombination sites (RMRS) . 
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4 . The method of claim 3 , wherein the RMRS 
comprises a first DNA sequence (RMRS5 1 ) , a core region 
A, and a second DNA sequence (RMRS 3 ' ) in the relative 
order RMRS 5 1 -core region A -RMRS 3 ' . 

5 

5. The method of claim 4, wherein said RMRS 5 ' and 
RNRS3 ' comprise palidromic sequences. 

6. The method of claim 5, wherein RMRS 5 ' and 

10 RNRS3 ' comprise palidromic sequences of approximately 10 
- 20 base pairs, and the core region comprises 
approximately 3-15 base pairs. 

7. The method of claim 4, wherein said RMRS is a 
15 loxP site and the recombinase is Cre. 

8. The method of claim 4, wherein said RMRS is a 
FRT site and the recombinase is FLP. 

20 9. The method of claim 4, wherein (i) the second 

recombination site is a pseudo-RMRS site, and said 
second recombination site comprises a first DNA sequence 
(attTS*) , a core region B, and a second DNA sequence 
(attT3 ! ) in the relative order attT5 1 -core region B- 

25 attT3 1 , and (ii) said first recombination site is a 

hybrid-recombination site comprising RMRSS'-core region 
B-RMRS3 1 . 

10. The method of claim 4, wherein (i) the second 
30 recombination site is a pseudo-RMRS site, and said 

second recombination site comprises a first DNA sequence 
(attT5'), a core region B, and a second DNA sequence 
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(attT3 1 ) in the relative order attT5'-core region B- 
attT3 ' , and (ii) said first recombination site comprises 
attT5'-core region B-attT3 ' . 

5 11. The method of claim 1, wherein the site- 

specific recombinase is a recombinase encoded by a phage 
selected from the group consisting of <t>C31, TP901-1, and 
R4 . 



10 12. The method of claim 11, wherein the 

recombinase normally facilitates recombination between a 
bacterial genomic recombination site (attB) and a phage 
genomic recombination site (attP) . 

15 13. The method of claim 12, wherein (i) the second 

recombination site comprises a pseudo-attP site, and 
(ii) said first recombination site comprises the attB 
site. 



20 14. The method of claim 13, wherein said 

recombinase is encoded by <|>C31. 

15. The method of claim 12, wherein (i) the second 
recombination site comprises a pseudo-attB site, and 
25 (ii) said first recombination site comprises the attP 
site . 



16. The method of claim 15, wherein said 
recombinase is encoded by <t>C31. 

30 

17. The method of claim 15, wherein said 
recombinase is encoded by phage R4 . 
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18. The method of claim 15, wherein said 
recombinase is encoded by phage TP901-1. 

19. The method of claim 12, wherein (i) attB 

5 comprises a first DNA sequence (attBS'), a bacterial 
core region, and a second DNA sequence (attB3 ' ) in the 
relative order attB5 1 -bacterial core region-attB3 ' , (ii) 
attP comprises a first DNA sequence (attPB 1 ), a phage 
core region, and a second DNA sequence (attP3 1 ) in the 

10 relative order attP5 ! -phage core region-attP3 1 , and 

(iii) wherein the recombinase meditates production of 
recombination-product sites that can no longer act as a 
substrate for the recombinase, said recombination- 
product sites comprising the relative order attBS 1 - 

15 recombination-product site-attP3 ' and attPS'- 
recombination-product site-attB3 . 

20. The method of claim 19, wherein (i) the second 
recombination site is a pseudo-attP site, and said 

2 0 second recombination site comprises a first DNA sequence 
(attTS 1 ), a core region B, and a second DNA sequence 
(attT3') in the relative order attT5'-core region B- 
attT3 1 , (ii) said first recombination site is an attB 
site comprising attB5 1 -bacterial core region-attB3 1 , and 

25 (iii) wherein the recombinase meditates production of 

recombination-product sites that can no longer act as a 
substrate for the recombinase, said recombination- 
product sites comprising the relative order attT5 ' - 
recombination-product site-attB3 ' {polynucleotide of 

30 interest }attB5 1 -recombination-product site-attT3 1 . 
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21. The method of claim 19, wherein (i) the second 
recombination site is a pseudo-attB site, and said 
second recombination site comprises a first DNA sequence 
(attTS 1 )/ a core region B, and a second DNA sequence 
5 (attT3 1 ) in the relative order attT5'-core region B- 
attT3 1 , (ii) said first recombination site is an attP 
site comprising attPS' -phage core region-attP3 1 , and 
(iii) wherein the recombinase meditates production of 
recombination-product sites that can no longer act as a 
10 substrate for the recombinase, said recombination- 
product sites comprising the relative order attT5 1 - 
recombination-product site-attP3 1 {polynucleotide of 
interest }attP5 1 -recombination-product site-attT3 ' . 

15 22. The method of claim 1, wherein said circular 

targeting construct further comprises a bacterial origin 
of replication. 

23. The method of claim 1, wherein said circular 
2 0 targeting construct further comprises a selectable 

marker . 

24. The method of claim 23, wherein said 
selectable marker provides for either positive or 

25 negative selection. 

25. The method of claim 1, wherein said 
polynucleotide sequence of interest comprises a 
transcriptional promoter sequence. 

30 
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26. The method of claim 1, wherein said 
polynucleotide sequence of interest comprises at least 
one expression cassette. 

5 27. The method of claim 26, wherein said 

expression cassette comprises a promoter operably linked 
to a polynucleotide sequence that encodes a product . 

28. The method of claim 27, wherein said product 
10 is an RNA molecule. 

29. The method of claim 27, wherein said product 
is a polypeptide. 

15 30. The method of claim 1, wherein the site- 

specific recombinase is introduced into the cell as a 
polypeptide . 

31. The method of claim 1, wherein the site- 

20 specific recombinase is introduced into the cell as a 
polynucleotide encoding the recombinase. 

32. The method of claim 31, wherein an expression 
cassette comprises the polynucleotide encoding the 

25 recombinase. 

33. The method of claim 32, wherein the expression 
cassette is carried on a transient expression vector. 

30 34. The method of claim 32, that further comprises 

introducing the site-specific recombinase into the cell 
as a polypeptide. 
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35. The method of claim 1, wherein said 
recombinase is introduced into the cell before 
introducing the circular targeting construct. 

5 36. The method of claim 1, wherein said 

recombinase is introduced into the cell concurrently 
with introducing the circular targeting construct. 

37. The method of claim 1, wherein said 
10 recombinase is introduced into the cell after 

introducing the circular targeting construct. 

38. A vector for site-specific integration of a 
polynucleotide sequence into the genome of a eucaryotic 

15 cell, said vector comprising, 

(i) a circular backbone vector, 

(ii) a polynucleotide of interest operably linked 
to a eucaryotic promoter, and 

(iii) a first recombination site, wherein the 
20 genome of said cell comprises a second recombination 

site native to the genome and recombination between the 
first and second recombination sites is facilitated by a 
site-specific recombinase. 

25 39. The vector of claim 38, wherein said site- 

specific recombinase is derived from a bacteriophage. 

40. The vector of claim 38, wherein said circular 
backbone vector is a procaryotic or eucaryotic vector. 

30 

41. The vector of claim 38, wherein said 
polynucleotide of interest operably linked to a 
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eucaryotic promoter further comprises additional control 
elements . 

42. The vector of claim 38, wherein the site- 
5 specific recombinase is selected from the group 

consisting of Cre recombinase, Cre-like recombinase, Flp 
recombinase, and R recombinase. 

43. The vector of claim 39, wherein the site- 

10 specific recombinase is a recombinase encoded by a phage 

selected from the group consisting of <J>C31, TP901-1, and 
R4 . 

44. The vector of claim 43, wherein the site- 
15 specific recombinase is encoded by phage <t>C31. 

45. The vector of claim 39, wherein the 
recombinase normally facilitates recombination between a 
bacterial genomic recombination site (attB) and a phage 

20 genomic recombination site (attP) . 

46. The vector of claim 45, wherein said first 
recombination site is either attB or attP. 

25 47. The vector of claim 46, wherein said 

recombinase is the site-specific recombinase encoded by 
phage (J>C31. 

48. The vector of claim 38, wherein said circular 
30 backbone vector further comprises a bacterial origin of 
replication. 
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49. The vector of claim 38, wherein said circular 
backbone vector further comprises a selectable marker. 

50. The vector of claim 49, wherein said 

5 selectable marker provides for either positive or 
negative selection. 

51- A kit for site-specific integration of a 
polynucleotide sequence into the genome of a eucaryotic 
10 cell, said kit comprising, 

(i) a vector of any of claims 38-5 0, and 

(ii) a site-specific recombinase. 

52. The kit of claim 51, wherein the site-specific 
15 recombinase is provided as a polypeptide composition. 

53. The kit of claim 51, wherein the site-specific 
recombinase is provided as a polynucleotide encoding the 
recombinase . 

20 

54. The kit of claim 51, wherein the site- 
specific recombinase is provided as both a polypeptide 
and a polynucleotide encoding the recombinase. 

25 55 . A eucaryotic cell having a modified genome, 

said modified genome comprising an integrated 
polynucleotide sequence of interest whose integration 
was mediated by a recombinase and wherein said 
integration was into a recombination site native to the 

30 eucaryotic cell genome and said integration created a 
recombination-product site comprising said 
polynucleotide sequence . 
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56. The cell of claim 55, wherein said 
recombination- site product comprises the components 
attTS ' -recombination-product site-attB3 1 and attB5 ' - 
recombination-product site-attT3 ' , wherein (i) the 
5 native recombination site is a pseudo-attP site, and 
said native recombination site comprises a first DNA 
sequence (attTB 1 ), a core region B, and a second DNA 
sequence (attT3 ' ) in the relative order attTB' -core 
region B-attT3 1 , (ii) said integrated polynucleotide 

10 sequence comprises a first recombination site comprising 
an attB site comprising attB5 1 -bacterial core region- 
attB3 1 , and (iii) wherein the recombinase meditates 
production of recombination-product sites that can no 
longer act as a substrate for the recombinase, said 

15 re combination -product sites comprising the relative 
order attT5 1 -recombination-product site- 
attB3 r {polynucleotide of interest }attB5 1 -recombination- 
product site-attT3'. 

20 57. The cell of claim 55, wherein said 

recombination- site product comprises the components 
attT5 ' -recombination-product site-attB3' and attBB ' - 
recombination-product site-attT3 1 , wherein (i) the 
native recombination site is a pseudo-attB site, and 

25 said native recombination site comprises a first DNA 
sequence (attTB'), a core region B, and a second DNA 
sequence (attT3 ' ) in the relative order attTS" -core 
region B-attT3 1 , (ii) said integrated polynucleotide 
sequence comprises a first recombination site comprising 

30 an attP site comprising attPS' -phage core region-attP3 1 , 
and (iii) wherein the recombinase meditates production 
of recombination-product sites that can no longer act as 
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a substrate for the recombinase, said recombination- 
product sites comprising the relative order attTS 1 - 
recombination-product site-attP3* {polynucleotide of 
interest }attP5 ' -recombination-product site-attT3 ■ . 

5 

58. A transgenic animal comprising at least one 
cell of any of claims 55-57. 

59. A transgenic plant comprising at least one 
10 cell of any of claims 55-57. 

60. A method of treating a disorder in a subject in 
need of such treatment , said method comprising: 

site-specifically integrating a polynucleotide 
15 sequence of interest in a genome of at least one cell of 
the subject, where said site- specif ic integration of the 
polynucleotide sequence of interest is performed as 
described in any of claims 1-37, wherein said 
polynucleotide facilitates production of a product that 
20 treats said disorder in the subject. 

61. The method of claim 60, wherein said site- 
specific integration is carried out in vivo in the 
subj ect . 

25 

62. The method of claim 60, wherein said site- 
specific integration is carried out ex vivo in cells and 
the cells are introduced into the subject. 

30 63. A method of modifying a genome of a cell, said 

method comprising 
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inserting an attB or an attP recombination site 
into the genome of a cell, wherein (i) said 
recombination site is recognized by a recombinase, and 
(ii) said cell normally does not comprise the attB or 
5 attP site, to provide a modified genome containing an 
attB or an attP site. 

64. The method of claim 63, wherein said cell is a 
eucaryotic cell , 

10 

65. The method of claim 63, wherein said inserting 
is carried out by transforming the cell with a 
polynucleotide containing the recombination site under 
conditions such that the polynucleotide is inserted into 

15 the genome. 

66. The method of claim 63, further comprising 
introducing (i) a circular targeting construct, 

comprising an attP recombination site and a 
20 polynucleotide sequence of interest, and (ii) a site- 
specific recombinase into the eucaryotic cell, wherein 
the genome of said cell comprises an attB recombination 
site and recombination between the attP and attB 
recombination sites is facilitated by the site-specific 
25 recombinase, 

maintaining the cell under conditions that allow 
recombination between said attP and attB recombination 
sites, wherein the recombination is mediated by the 
site-specific recombinase and the result of the 
30 recombination is site-specific integration of the 

polynucleotide sequence of interest in the genome of the 
cell. 
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67. The method of claim 63, further comprising 
introducing (i) a circular targeting construct, 

comprising an attB recombination site and a 
polynucleotide sequence of interest, and (ii) a site- 
5 specific recombinase into the eucaryotic cell, wherein 
the genome of said cell comprises an attP recombination 
site and recombination between the attB and attP 
recombination sites is facilitated by the site-specific 
recombinase, 

10 maintaining the cell under conditions that allow 

recombination between said attB and attP recombination 
sites, wherein the recombination is mediated by the 
site-specific recombinase and the result of the 
recombination is site-specific integration of the 

15 polynucleotide sequence of interest in the genome of the 
cell. 

68. The method of claim 63, wherein the site- 
specific recombinase is a recombinase encoded by a phage 

20 selected from the group consisting of 4>C31, TP901-1, and 
R4. 

69. An expression cassette, comprising 

a polynucleotide encoding a site-specif ic 
25 recombinase, wherein (i) the recombinase is encoded by a 
phage selected from the group consisting of $C31, TP901- 
1, and R4 f and (ii) the recombinase is operably linked 
to a eucaryotic promoter. 

30 70. The expression cassette of claim 69, further 

comprising a backbone vector that is a procaryotic or 
eucaryotic vector. 
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71. The expression cassette of claim 69, wherein 
said recombinase operably linked to a eucaryotic 
promoter further comprises additional control elements. 
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